Hi Galaxy devs,
I've installed Galaxy in local and I see that in spite of the users remove
their histories using the "Delete history" option, the used space is not
As I understand reading the Galaxy documentation, after deleting the
history, the history is marked as "Deleted" but files are not removed.
After 60 days, Galaxy automatically remove them.
> 1. If you delete a specific history using the **Options** link at the
> top of the history panel, that history and all of its associated datasets
> will be removed from disk 60 days after you deleted the history.
> 2. Those specific history items ( datasets ) that you delete from one
> of your histories by clicking the "X" icon in the history item will be
> removed from disk after 60 days, but unless you manually delete your
> history, you will still be able to view the history itself ( only the
> dataset that you deleted from your history will be removed from disk ).
I suppose this should be the default behavior too for local installations.
but I did not configure anything about that and I need to confirm this is
true because the used space becomes bigger each day!
Is there any way to change this periodicity to less days?
Thank you and regards,
Rafa Hernández de Diego
Genomics of Gene Expression Lab.
Bioinformatics and Genomics Department
Prince Felipe Research Centre (CIPF)
C/ Eduardo Primo Yúfera 3
46012 Valencia, Spain
The minimum requirements for Galaxy are pretty much any recent model
computer. A basic install will load and run within a very short amount
of time, following the instructions at:
Advanced tools come from the Tool Shed. These are installed separately.
No programming is needed, but some very basic unix skills are required,
as they would be for ongoing server maintenance. The Tool Shed is one of
the best documented parts of Galaxy, so you should be able to find all
the answers here, or this is the right list to use for questions about
local installs (galaxy-dev(a)bx.psu.edu):
However, running a production instance that is intended to run compute
intensive tools (such as Tophat2), and where you have some throughput
goals, will require more substantial resources. This is always a
difficult question to answer since so much depends on the tools used and
the data volume. But in general, minimum requirements are about around
the same as what those underlying tools would require on their own, if
run line-command. So for Tophat or Tophat2, and really the entire Tuxedo
RNA-seq tool package, you might be able to get by on 8G memory and 2 or
4 cores. But, it will probably be slow and if you are running replicates
through Cuffdiff at the end, you might run out of memory if the files
are large and the genome is large (such as human). And if you are are
hosting a web Galaxy at the same time with visualizations and such,
well, this is why systems are often set up with clusters. With all of
these will be competing for the same resources, going low can work, but
this will be something that will have to be managed/tested, and it will
change through time as tools upgrade.
You could test this out by setting up a cloud instance with the hardware
you plan to use, loading some of your data, and running your workflow to
see how this benchmarks. Have users on the instance while you are doing
this - to judge performance. Cloud installations come with many of the
advanced tools and data indexes already installed/configured, so this
would be less investment than buying the hardware first and finding out
later it was not not enough.
And of course Slipstream Galaxy is an option. The whole intention here
is to make a complete package with tools & data already configured, in a
system that has enough compute capability to do the work, for
scientists/labs who do not want to deal with administrative tasks or
work in the cloud (for whatever reason).
Good luck with your decision! Other are welcomed to comment about how
they have set up their system.
On 7/15/13 7:14 AM, Zain A Alvi wrote:
> Hi Jen,
> I hope this reaches you well. I have a small question in regards to
> setting up a galaxy server. My mentors and I are looking into buying a
> server for doing NGS analysis through the use of Galaxy. We saw that
> slipstream's specifications for hardware to be the following:
> CPU: 2x Intel Xeon Processor E5-2690, 8 core (16 cores total)
> RAM Memory: 12x 32GB RDIMM (384 GB) with option to upgrade it to 512 GB
> Storage (Hard Drive space): 7x 3TB SAS 6 Gbps (16 TB usable) with 1 x
> 100GB Solid State Disk
> Power: Dual Pedundant Power Supplies
> Network: Dual Gigabit Network Adapter
> We are wondering what can be the minimum server hardware
> specifications that Galaxy be run on.
> Our second question is if we install Galaxy on the server, do all the
> tools currently available on Galaxy come pre-installed on it or do we
> have to program (Via Perl and Python) and install each of those tool
> sets ourselves. If we have to install those tools ourselves, is there
> a guide that we can do so? Lastly, how can we upgrade the tool sets
> such as Tophat 1.44 to Tophat 2 on this server. I was wondering about
> the last question as Tophat 1.44 is available on the main Galaxy
> server whereas Tophat 2 is available on the test Galaxy server.
> Sorry for so many questions. Thank you again for all the great help.
Galaxy Support and Training
Our lab recently installed a local version of Galaxy on a mid-2012 Mac
Pro computer. We can access the Galaxy server and sign in as an
administrator. Today we tried creating a Data Library, adding a
dataset to it, and uploading a directory of files. We followed the
Galaxy documentation at
to setup this feature:
- Admin > Data Library > Add datasets > Upload directory of files
- file format was set to auto-detect
- and we chose the option to link to files instead of copying them
Galaxy confirmed that the files were successfully uploaded. However,
in the data library, under the Message column, is a message in red
saying "Job error (click name for more info)". Clicking on one of the
uploaded files displays a page with this information:
Date uploaded: 2013-06-28
File size: 7.5 GB
Data type: auto
Traceback (most recent call last): File
386, in __main__() File
357, in __main__ output_paths =
Job Standard Error
Traceback (most recent call last):
line 386, in
line 357, in __main__
output_paths = parse_outputs( sys.argv[4:] )
line 64, in parse_outputs
id, files_path, path = arg.split( ':', 2 )
ValueError: need more than 1 value to unpack
Number of data lines: None
Disk file: /Volumes/G-SPEED Q/data/Person 2012 project/DCP2 mef.fastq
1. Should we be concerned about this error?
2. If so, what is the right way to fix it?
3. If not, how do we remove the red error message next to each file:
"Job error (click name for more info)" ?
I've tried using the Data Manager (Admin > Data > Manage local data
(beta)) to install builds for BWA and Samtools on my local Galaxy instance.
Previous to using the Data Manager, I used to add the build to
tool-data/shared/ucsc/builds.txt, create the .fai indexes (for samtools)
from the command line, add them to tool-data/sam_fa_indices.loc and
restart Galaxy (obviously doing a similar thing for BWA and adding the
build to bwa_index.loc).
I thought I'd try using the Data Manager to add builds for BWA and
Samtools. The BWA builds work fine (I can map to the build), but when I
try to use SAM-to-BAM I get the error "Sequences are not currently
available for the specified build."
Using the Data Manager creates the directory tool-data/n_sylvestris/ which
contains the sub-dirs 'seq', 'bwa_index' and 'sam_index'.
'seq' contains a symlink to the n_sylvestris.fa sequence.
'sam_index' and 'bwa_index' both contains the sub-directory
'n_sylvestris', which contains a symlink to the symlink for
n_sylvestris.fa in 'seq' along with their respective n_sylvestris.fa.xxx
OK - all goodŠ
In tool-data/testtoolshed.g2.bx.psu.edu/repos/blankenberg/ there are three
data_manager_bwa_index_builder, data_manager_sam_fa_index_builder and
All three directories contain all_fasta.loc, tool_data_table_conf.xml,
tool_data_table_conf.xml.sample and (for sam and bam dirs) their pertinent
The data_manager_fetch_genome_all_fasta/all_fasta.loc file contains the
path to the fasta symlinks.
The all_fasta.loc files in the sam and bwa data_manager_index_builder
directories don't contain any uncommented lines.
The index.loc files in the sam and bwa data_manager_index_builder
directories point to:
As BWA runs fine, it's obviously reading the bwa_index.loc file from the
...but it's not reading the samtools indexes at:
For Galaxy to find the sam indexes, I have to go to the
tool-data/sam_fa_indices.loc file and manually insert into it the contents
So, I guess my question is: other than inserting the genome builds into
builds.txt, should I be doing any other configuration to get Data Manager
to write and configure Galaxy to read it's newly created builds. I find it
strange that the BWA builds work OK, but the Samtools ones don't.
I've done a few greps for mentions of .loc files in Galaxy and the only
difference between the bwa and sam .loc files is that there is a file
tool-data/tool_data_table_conf.xml (plus a .sample version) which contains:
<!-- Use the file tool_data_table_conf.xml.oldlocstyle if you don't want
to update your loc files as changed in revision 4550:535d276c92bc-->
<!-- Location of SAMTools indexes and other files -->
<table name="sam_fa_indexes" comment_char="#">
<columns>line_type, value, path</columns>
<file path="tool-data/sam_fa_indices.loc" />
Could Galaxy be reading this file and ignoring the one in
Dr. Graham Etherington
Bioinformatics Support Officer,
The Sainsbury Laboratory,
Norwich Research Park,
Norwich NR4 7UH.
Tel: +44 (0)1603 450601
I am trying to connect my local database/webpages to my local galaxy
server. But I have a hard to implement "send output to galaxy" function.
Right now I have a sending page to send a form to galaxy and it looks like
everything goes well. From the variable json_params(data_source.py), I can
see that all parameters have been received by galaxy. But the problem is
that how I get my parameters from my getfile.php file(This is used in the
Galaxy_URL too, the callback function)? I tried to print out $_POST, $_GET,
$_REQUEST, nothing shows up. I did see some output from $_Server. I am
using php. Please help!
We have just installed a local Galaxy instance on a workstation and I am
having issues getting references to show up for my samtools section. I have
setup many.loc files correctly for my other tools and I am able to see and
use them in Galaxy.
In the paster.log I see the following:
galaxy.tools.data WARNING 2013-07-10 17:05:06,034 Line 30 in tool data table
'sam_fa_indexes' is invalid (HINT: '<TAB>' characters must be used to
galaxy.tools.data WARNING 2013-07-10 17:05:06,034 Line 31 in tool data table
'sam_fa_indexes' is invalid (HINT: '<TAB>' characters must be used to
galaxy.tools.data DEBUG 2013-07-10 17:05:06,034 Loaded tool data table
I know for a fact that when I first created this file I had a space (not a
TAB) in between index and the build. After restarting the Galaxy service
several times and double checking our loc files I noticed the error and
replace the space with a TAB.
Using vi (with :set list) we see that there are TABs in between the fields.
Even with the correct file, I still get the same error in the paster.log.
Does this get saved in the database? How do I resolve this?
Thanks for the help!
I had migrated BWA tool while at release_2013.02.08 revision. It was working fine, however, it fails to run with following warning message after updating Galaxy to release_2013.06.03.
Building dependency shell command for dependency 'bwa'
galaxy.tools WARNING 2013-07-11 17:16:28,429 Failed to resolve dependency on 'bwa', ignoring
The bwa tool dependency is installed and present on the system. I am not sure why it's not being detected after the update. I can fix it by sourcing env.sh file before starting Galaxy server process, but that's not a good solution. Any pointers on resolving this issue would be really helpful.
Also, Galaxy is unable to fetch latest update for the bwa_wrapper. The migration script had installed first release of the bwa_wrapper, but now there is an updated revision available. I tried getting updates, but Galaxy thinks it's current bwa revision is up to date.
Can you tell me if there is any way to lessen the load of Galaxy on our server. Even when it's idle it's consuming a lot of resources.
The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.