Apologies for a simple question, I cannot find seem to find an answer
looking through the docs and Galaxy tool xml files in the distro.
Is there a way to have a select menu where I don't want there to be a
default selection (when the form loads it is empty) yet the user must pick
something otherwise one of those nice validator red message comes up telling
you to pick something? For example:
<param name="organism" type="select" force_select="true" label="Organism">
<option value="Homo_sapiens">Homo sapiens</option>
<option value="Mus_musculus">Mus musculus</option>
<option value="Rattus_norvegicus">Rattus norvegicus</option>
If I use the validator type="empty_field" or "no_options" it doesn't work,
the user can hit execute with an empty select and it tries to execute.
Don't know if my approach is wrong, in one of my tools I tried to create a
conditional with a boolean parameter and I cannot get it to work. E.g.
<param name="input1" type="boolean" checked="false" truevalue="dothis"
falsevalue="dothat" label="Test Checkbox"/>
<param name="input2" type="text" size="100" label="Test"/>
The form throws and error with:
Exception: ('No case matched value:', 'testCond', False)
Even if I try to remove truevalue and falsevalue from the param and change
the when value="True" and when value="False" I still get the same error.
Is it possible to configure Galaxy so it uses a remote cluster? In our
setup it is not possible to install the Galaxy frontend on the cluster
head/main node. I would like to know if you can connect to a compute
cluster on a remote network from the main Galaxy application.
Thanks in advance!
I'd like to be able to use the "Get Microbial Data" tool in our
local Galaxy install, which appears as though it could allow
access to a local copy of the NCBI "Bacteria" FTP site,
>From looking at the tool's source code, I see I must populate
microbial_data.loc file, however the microbial_data.loc.sample
is not very helpful:
#This is a sample file distributed with Galaxy that enables tools
#to retrieve microbial data via a URL
What this doesn't tell me is the meaning of the columns. Apparently
this is really three tables in one, determined by the first entry.
ORG entries are used by this tool for the selection of the kingdom
and species. They appear to have the following columns, one per
0. The "ORG" column itself, not counted in the XML offsets
5. Comma separated list of chromosomes/plasmids
6. URL for NCBI genome project
The CHR entries don't seem to be used directly by this tool.
There is one entry per chromosome/plasmid.
0. The "CHR" entry, not counted in the XML offsets
2. Description including species and chromosome/plasmid
4. Length of sequence (nucleotides)
5. GI number
7. URL for NCBI nucleotide database
Then there are the DATA entries, which appear to reference
local files. There are multiple DATA entries per CHR entry:
0. The "DATA" entry, not counted in the XML offsets
1. Identifier (composite of ORG id, CHR id, and data type)
2. Identifier of ORG line
3. Identifier of CHR line
4. Data type (CDS, tRNA, rRNA, sequence, GeneMark, Glimmer3)
5. File format (fasta or bed)
Want I want to do is generate a microbial_data.loc file
from a local mirror of ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
In addition to understanding the loc file format, it also seems
I need to generate some bed files from the NCBI provided
data, e.g. for NC_008265 which is one of the examples in
the sample loc files, I'd need the following files:
Referring to the NCBI FTP site for this organism, we have:
I can see for example how to map *.ptt (protein tables) into *.CDS.bed,
and similarly for the Glimmer3 and GeneMark predictions. I could
also probably parse *.gbk to generate bed tabular files for any
annotated tRNA and rRNA entries (and the CDS entries of course.).
But rather than reinventing the wheel, how do you do this at Penn State?
Also, I'd like to offer access to the chromosome, CDS, tRNA, and rRNA
sequences themselves (as FASTA files, not just bed tabular). Am I right
that currently the "Get Microbial Data" tool doesn't offer this?
I've noticed from a couple of examples that when defining
the command line string in an tool's XML wrapper (i.e. cheetah
template), you can use $input_param_name.extension (or
apparently $input_param_name.ext) to get the file format
(metadata) for an input parameter called input_param_name
(while of course you use $input_param_name to get the
filename for this param).
This doesn't seem to be documented on the tool conf wiki:
Most of the examples I found looking at the tool XML use the
longer form (.extension), although interestingly in a recent
commit Dan used just .ext in the fastx_clipper.xml file to
special case Sanger FASTQ. Is that a valid alternative too?
I'm curious about the naming here - did Galaxy once use
the format name as the actual file extension? Now at least
all data files seem to have .dat as the extension (which I'm
sure presents a small problem for some command line tools
which make file format inferences from the filename extension,
requiring hacks in their Galaxy wrappers).
I am currently setting up a Galaxy instance on a cluster.
My team is intended to analyze data from NGS experiments.
Our different pipelines/workflows will be integrated into Galaxy.
It might be a simple question but I don't find useful information on this
When looking back on a file result, a crucial need for us is to be able to
- who ? (generated this file)
- when ? (precise date and time)
- what ? (applied workflow)
- how ? (applied parameters in each workflow 'box')
- and stuff like computational time & others if possible.
Is there something allowing to retrieve these informations in galaxy ?
Logfiles, a table in the database that we could interrogate ?
As far as I currently understand, histories are managed on a per user basis,
allowing one to save and share histories. Does it enable one to reproduce
exactly the same pipeline without having to re-specify parameters used with
new input data ('applying a workflow instance') ?
Workflows may indeed be modified/improved frequently. We have to keep memory
of which 'version' was used to generate our results.
And where are stored the output files ? Is it possible to force galaxy to
put a file result in pre-specified place ?
It seems like once you have given galaxy the input files, you do not control
where the output files will be stored (internal management) and all you have
is a link to this output in your history panel.
For the input files, I saw that there is a way to avoid Galaxy to duplicate
input files when uploading them and so keep your tree structure, something
which is especially useful when dealing with huge NGS data. Does exist this
kind of flexibility for output files ?
I'm using the nglims build via "hg clone http://www.bx.psu.edu/hg/galaxy galaxy_dist" as mentioned in the wiki, Apache proxy and Postgres DB enabled.
Please correct me if any of my initial "goals" below are unattainable or if I am misinterpreting Galaxy's capabilities. I'm still attempting to learn the overall system capabilities of Galaxy and its integration between nglims samples+projects and the native Sample/Seq tracking components.
Goals I'm trying to accomplish:
(1) Create a fresh custom sample form in the Admin tab's "Manage Form Definitions" to use in the Lab tab's (nglims section) "next gen sample submission" process, and assign this custom sample form to be used as the "default" form when creating a new sample in nglim section's "next gen sample submission",
(2) Allow multiple/various defined samples created with this custom sample submission form to be submitted in the Lab tab's "Submit samples as a project" as a project for tracking in the Admin tab section,
(3) Allow "Find Samples" in the Admin tab to potentially query for nglims samples created with this sample form.
(1) Can samples created in the nglims section currently be tracked in the Admin tab's "Sample Tracking" area, or does the sample tracking functionality only allow tracking of the samples created with the native form definitions (within the Admin section, outside of the Lab tab "next gen sample submission")? There was a prior posting a while back that mentioning the following (verbatim):
"You probably don't want to mix the "nglims" approach with Galaxy's
native sample tracking functionality. While I've build the nglims
part to use the same database tables and be fairly interoperable,
it does take some different data representation choices, especially
with regards to the sample/request relationship."
Does this mean we're limited to creating samples outside of the nglims section if we want to track them within the native sequencing section? Secondly, will there be future support for tracking nglims-created samples with the native sample/seq tracking?
(2) How to change which sample form is used in the Lab tab's "Define samples and services". In fact, I cannot determine what form is actually used which corresponds to the form which appears when one clicks the Lab tab's "next gen sample submission".
When creating and/or editing a form in the Admin tab's "manage form definitions", I've created a sample submission form with multiple grids, some grids which have multiple fields beneath. When editing this form and assigning fields to the respective grids, each time I save and re-edit the form, all of the fields get reset to whatever is listed as the 1st grid in the "Select the grid layout to place this field" dropdown menu, rendering all of my assigned fields to be reset to the first grid. Adding or removing a grid with the submit button also does the same thing. Any thoughts?
Galaxy appears to be a very promising product and we're looking forward to having the researchers and sequence team use this environment, once we finalize the configuration and push into production. I'm sure I'll have additional questions once I work through some of these current issues. My apologies for the extended length of this post and if I've somehow misinterpreted some of the functionalities and compatibilities.
Thanks in advance for your feedback.
Greg and rc;
I've been trying to keep up with the latest changes in the
FormValues table, and am running into one issue with the upgrade
During the conversion of the field from a list to dictionary, this
assumes all of the values are strings, and strips out any quote
characters. I'm using this table to store more complex objects like
dictionaries, and they are disrupted by that change.
Is this step necessary? The to_json_string call should quote any of
these characters appropriately.
This patch gets rid of the quote stripping so should be safe to make
the conversion without changing content:
I'm sure this has been covered somewhere but I haven't been able to find
it on the wiki or by searching the list archives. Can I, and if I can,
how do I, migrate all user accounts from a previous local Galaxy
development install to a new production one (a brand new install in a
separate location on the same machine), preserving account info,
workflows, histories, data, etc.?
If this can be done, can it be done selectively, i.e., for only
Thanks for any help offered,
Dean A. Snyder
Center for Inherited Disease Research (CIDR)
Johns Hopkins School of Medicine
Bayview Research Campus
333 Cassell Dr, Triad Bldg, Suite 2000
Baltimore, MD 21224
cell:717 668-3048 office:410-550-4629