July 2013 - galaxy-dev - lists.galaxyproject.org

GALAXY LOCAL INSTALLATION: Remove history
by Rafael Hernández 15 Jul '13

15 Jul '13

Hi Galaxy devs, I've installed Galaxy in local and I see that in spite of the users remove their histories using the "Delete history" option, the used space is not free. As I understand reading the Galaxy documentation, after deleting the history, the history is marked as "Deleted" but files are not removed. After 60 days, Galaxy automatically remove them. > > 1. If you delete a specific history using the **Options** link at the > top of the history panel, that history and all of its associated datasets > will be removed from disk 60 days after you deleted the history. > 2. Those specific history items ( datasets ) that you delete from one > of your histories by clicking the "X" icon in the history item will be > removed from disk after 60 days, but unless you manually delete your > history, you will still be able to view the history itself ( only the > dataset that you deleted from your history will be removed from disk ). > > I suppose this should be the default behavior too for local installations. but I did not configure anything about that and I need to confirm this is true because the used space becomes bigger each day! Is there any way to change this periodicity to less days? Thank you and regards, Rafa Hernández de Diego Genomics of Gene Expression Lab. Bioinformatics and Genomics Department Prince Felipe Research Centre (CIPF) C/ Eduardo Primo Yúfera 3 46012 Valencia, Spain

2 1

Galaxy Server Questions
by Jennifer Jackson 15 Jul '13

15 Jul '13

Hi Zain, The minimum requirements for Galaxy are pretty much any recent model computer. A basic install will load and run within a very short amount of time, following the instructions at: http://getgalaxy.org Advanced tools come from the Tool Shed. These are installed separately. No programming is needed, but some very basic unix skills are required, as they would be for ongoing server maintenance. The Tool Shed is one of the best documented parts of Galaxy, so you should be able to find all the answers here, or this is the right list to use for questions about local installs (galaxy-dev(a)bx.psu.edu) http://wiki.galaxyproject.org/Tool%20Shed However, running a production instance that is intended to run compute intensive tools (such as Tophat2), and where you have some throughput goals, will require more substantial resources. This is always a difficult question to answer since so much depends on the tools used and the data volume. But in general, minimum requirements are about around the same as what those underlying tools would require on their own, if run line-command. So for Tophat or Tophat2, and really the entire Tuxedo RNA-seq tool package, you might be able to get by on 8G memory and 2 or 4 cores. But, it will probably be slow and if you are running replicates through Cuffdiff at the end, you might run out of memory if the files are large and the genome is large (such as human). And if you are are hosting a web Galaxy at the same time with visualizations and such, well, this is why systems are often set up with clusters. With all of these will be competing for the same resources, going low can work, but this will be something that will have to be managed/tested, and it will change through time as tools upgrade. You could test this out by setting up a cloud instance with the hardware you plan to use, loading some of your data, and running your workflow to see how this benchmarks. Have users on the instance while you are doing this - to judge performance. Cloud installations come with many of the advanced tools and data indexes already installed/configured, so this would be less investment than buying the hardware first and finding out later it was not not enough. http://usegalaxy.org/cloud And of course Slipstream Galaxy is an option. The whole intention here is to make a complete package with tools & data already configured, in a system that has enough compute capability to do the work, for scientists/labs who do not want to deal with administrative tasks or work in the cloud (for whatever reason). http://bioteam.net/slipstream/galaxy-edition/ Good luck with your decision! Other are welcomed to comment about how they have set up their system. Jen Galaxy team On 7/15/13 7:14 AM, Zain A Alvi wrote: > Hi Jen, > > I hope this reaches you well. I have a small question in regards to > setting up a galaxy server. My mentors and I are looking into buying a > server for doing NGS analysis through the use of Galaxy. We saw that > slipstream's specifications for hardware to be the following: > > CPU: 2x Intel Xeon Processor E5-2690, 8 core (16 cores total) > RAM Memory: 12x 32GB RDIMM (384 GB) with option to upgrade it to 512 GB > Storage (Hard Drive space): 7x 3TB SAS 6 Gbps (16 TB usable) with 1 x > 100GB Solid State Disk > Power: Dual Pedundant Power Supplies > Network: Dual Gigabit Network Adapter > > We are wondering what can be the minimum server hardware > specifications that Galaxy be run on. > > Our second question is if we install Galaxy on the server, do all the > tools currently available on Galaxy come pre-installed on it or do we > have to program (Via Perl and Python) and install each of those tool > sets ourselves. If we have to install those tools ourselves, is there > a guide that we can do so? Lastly, how can we upgrade the tool sets > such as Tophat 1.44 to Tophat 2 on this server. I was wondering about > the last question as Tophat 1.44 is available on the main Galaxy > server whereas Tophat 2 is available on the test Galaxy server. > > Sorry for so many questions. Thank you again for all the great help. > > Sincerely, > > Zain -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org

1 0

Error Uploading Directory of Files
by Nicholas Kline 15 Jul '13

15 Jul '13

Hi, Our lab recently installed a local version of Galaxy on a mid-2012 Mac Pro computer. We can access the Galaxy server and sign in as an administrator. Today we tried creating a Data Library, adding a dataset to it, and uploading a directory of files. We followed the Galaxy documentation at http://wiki.galaxyproject.org/Admin/DataLibraries/UploadingLibraryFiles?act… to setup this feature: - Admin > Data Library > Add datasets > Upload directory of files - file format was set to auto-detect - and we chose the option to link to files instead of copying them Galaxy confirmed that the files were successfully uploaded. However, in the data library, under the Message column, is a message in red saying "Job error (click name for more info)". Clicking on one of the uploaded files displays a page with this information: Date uploaded: 2013-06-28 File size: 7.5 GB Data type: auto Build: sacCer2 Miscellaneous information: Traceback (most recent call last): File "/Users/administrator/galaxy-dist/tools/data_source/upload.py", line 386, in __main__() File "/Users/administrator/galaxy-dist/tools/data_source/upload.py", line 357, in __main__ output_paths = Job Standard Error Traceback (most recent call last): File "/Users/administrator/galaxy-dist/tools/data_source/upload.py", line 386, in __main__() File "/Users/administrator/galaxy-dist/tools/data_source/upload.py", line 357, in __main__ output_paths = parse_outputs( sys.argv[4:] ) File "/Users/administrator/galaxy-dist/tools/data_source/upload.py", line 64, in parse_outputs id, files_path, path = arg.split( ':', 2 ) ValueError: need more than 1 value to unpack error Database/Build: sacCer2 Number of data lines: None Disk file: /Volumes/G-SPEED Q/data/Person 2012 project/DCP2 mef.fastq Questions: 1. Should we be concerned about this error? 2. If so, what is the right way to fix it? 3. If not, how do we remove the red error message next to each file: "Job error (click name for more info)" ? Thank you

2 1

[GSoC2013] Week 4 Accomplishments and Week 5 Plans
by Saket Choudhary 15 Jul '13

15 Jul '13

Hi All, Week 4 blog post : http://galaxy-gsoc2013.blogspot.com/2013/07/week-4-accomplishments-and-week… Do drop in your comments , if any. Thanks Saket

1 0

No samtools build after building index through Data Manager.
by graham etherington (TSL) 15 Jul '13

15 Jul '13

Hi, I've tried using the Data Manager (Admin > Data > Manage local data (beta)) to install builds for BWA and Samtools on my local Galaxy instance. Previous to using the Data Manager, I used to add the build to tool-data/shared/ucsc/builds.txt, create the .fai indexes (for samtools) from the command line, add them to tool-data/sam_fa_indices.loc and restart Galaxy (obviously doing a similar thing for BWA and adding the build to bwa_index.loc). I thought I'd try using the Data Manager to add builds for BWA and Samtools. The BWA builds work fine (I can map to the build), but when I try to use SAM-to-BAM I get the error "Sequences are not currently available for the specified build." Using the Data Manager creates the directory tool-data/n_sylvestris/ which contains the sub-dirs 'seq', 'bwa_index' and 'sam_index'. 'seq' contains a symlink to the n_sylvestris.fa sequence. 'sam_index' and 'bwa_index' both contains the sub-directory 'n_sylvestris', which contains a symlink to the symlink for n_sylvestris.fa in 'seq' along with their respective n_sylvestris.fa.xxx index files. OK - all goodŠ In tool-data/testtoolshed.g2.bx.psu.edu/repos/blankenberg/ there are three subdirectories: data_manager_bwa_index_builder, data_manager_sam_fa_index_builder and data_manager_fetch_genome_all_fasta All three directories contain all_fasta.loc, tool_data_table_conf.xml, tool_data_table_conf.xml.sample and (for sam and bam dirs) their pertinent index.loc file. The data_manager_fetch_genome_all_fasta/all_fasta.loc file contains the path to the fasta symlinks. The all_fasta.loc files in the sam and bwa data_manager_index_builder directories don't contain any uncommented lines. The index.loc files in the sam and bwa data_manager_index_builder directories point to: tool-data/n_sylvestris/bwa_index/n_sylvestris/n_sylvestris.fa tool-data/n_sylvestris/sam_index/n_sylvestris/n_sylvestris.fa As BWA runs fine, it's obviously reading the bwa_index.loc file from the directory: tool-data/testtoolshed.g2.bx.psu.edu/repos/blankenberg/data_manager_bwa_ind ex_builder/fe6508204acc/bwa_index.loc ...but it's not reading the samtools indexes at: tool-data/testtoolshed.g2.bx.psu.edu/repos/blankenberg/data_manager_sam_fa_ index_builder/926e50397b83/sam_fa_indices.loc For Galaxy to find the sam indexes, I have to go to the tool-data/sam_fa_indices.loc file and manually insert into it the contents of: tool-data/testtoolshed.g2.bx.psu.edu/repos/blankenberg/data_manager_sam_fa_ index_builder/926e50397b83/sam_fa_indices.loc So, I guess my question is: other than inserting the genome builds into builds.txt, should I be doing any other configuration to get Data Manager to write and configure Galaxy to read it's newly created builds. I find it strange that the BWA builds work OK, but the Samtools ones don't. I've done a few greps for mentions of .loc files in Galaxy and the only difference between the bwa and sam .loc files is that there is a file tool-data/tool_data_table_conf.xml (plus a .sample version) which contains:  <tables>  <table name="sam_fa_indexes" comment_char="#"> <columns>line_type, value, path</columns> <file path="tool-data/sam_fa_indices.loc" /> </table> </tables> Could Galaxy be reading this file and ignoring the one in tool-data/testtoolshed.g2.bx.psu.edu/repos/blankenberg/ ?? Best wishes, Graham Dr. Graham Etherington Bioinformatics Support Officer, The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH. UK Tel: +44 (0)1603 450601

2 2

Help with "Send output to Galaxy" function
by Huayan Gao 15 Jul '13

15 Jul '13

Hi, I am trying to connect my local database/webpages to my local galaxy server. But I have a hard to implement "send output to galaxy" function. Right now I have a sending page to send a form to galaxy and it looks like everything goes well. From the variable json_params(data_source.py), I can see that all parameters have been received by galaxy. But the problem is that how I get my parameters from my getfile.php file(This is used in the Galaxy_URL too, the callback function)? I tried to print out $_POST, $_GET, $_REQUEST, nothing shows up. I did see some output from $_Server. I am using php. Please help! Best, Huayan

2 2

sam_fa_indexes
by Ryan Davis 12 Jul '13

12 Jul '13

Hi, We have just installed a local Galaxy instance on a workstation and I am having issues getting references to show up for my samtools section. I have setup many.loc files correctly for my other tools and I am able to see and use them in Galaxy. In the paster.log I see the following: galaxy.tools.data WARNING 2013-07-10 17:05:06,034 Line 30 in tool data table 'sam_fa_indexes' is invalid (HINT: '<TAB>' characters must be used to separate fields): index hg19 /mnt/data3/Reference/human/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/ genome.fa galaxy.tools.data WARNING 2013-07-10 17:05:06,034 Line 31 in tool data table 'sam_fa_indexes' is invalid (HINT: '<TAB>' characters must be used to separate fields): index mm10 /mnt/data3/Reference/mouse/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/ genome.fa galaxy.tools.data DEBUG 2013-07-10 17:05:06,034 Loaded tool data table 'sam_fa_indexes' I know for a fact that when I first created this file I had a space (not a TAB) in between index and the build. After restarting the Galaxy service several times and double checking our loc files I noticed the error and replace the space with a TAB. Using vi (with :set list) we see that there are TABs in between the fields. #index^Ihg18^I/depot/data2/galaxy/sam/hg18.fa$ #index^Ihg19^I/depot/data2/galaxy/sam/hg19.fa$ index^Ihg19^I/mnt/data3/Reference/human/Homo_sapiens/UCSC/hg19/Sequence/sam/ hg19.fa$ index^Imm10^I/mnt/data3/Reference/mouse/Mus_musculus/UCSC/mm10/Sequence/sam/ mm10.fa$ Even with the correct file, I still get the same error in the paster.log. Does this get saved in the database? How do I resolve this? Thanks for the help! Ryan

1 0

Dynamically decide where a Galaxy tool runs
by Saliya Ekanayake 12 Jul '13

12 Jul '13

Hi, I see in documentation that Galaxy tools may be configured to run on a cluster (http://wiki.galaxyproject.org/Admin/Config/Performance/Cluster<http://wiki.galaxyproject.org/Admin/Config/Performance/Cluster#DRMAA>). This is something our research group is interested in, but it's not possible to specify which cluster the particular tool should run on ahead of time. Therefore, is it possible to specify the cluster to run when the tool is actually being used when composing the workflow from Galaxy? Thank you, Saliya -- Saliya Ekanayake esaliya(a)gmail.com Cell 812-391-4914 Home 812-961-6383 http://saliya.org

3 4

BWA - failed to resolve dependency error
by Shantanu Pavgi (Campus) 12 Jul '13

12 Jul '13

I had migrated BWA tool while at release_2013.02.08 revision. It was working fine, however, it fails to run with following warning message after updating Galaxy to release_2013.06.03. {{{ Building dependency shell command for dependency 'bwa' galaxy.tools WARNING 2013-07-11 17:16:28,429 Failed to resolve dependency on 'bwa', ignoring }}} The bwa tool dependency is installed and present on the system. I am not sure why it's not being detected after the update. I can fix it by sourcing env.sh file before starting Galaxy server process, but that's not a good solution. Any pointers on resolving this issue would be really helpful. Also, Galaxy is unable to fetch latest update for the bwa_wrapper. The migration script had installed first release of the bwa_wrapper, but now there is an updated revision available. I tried getting updates, but Galaxy thinks it's current bwa revision is up to date. -- Thanks, Shantanu

1 1

Lessen load on server
by Auerbach, Kenneth R. 12 Jul '13

12 Jul '13

Hello, Can you tell me if there is any way to lessen the load of Galaxy on our server. Even when it's idle it's consuming a lot of resources. Thank you. Ken. The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.

2 1