December 2010 - galaxy-dev - lists.galaxyproject.org

empty select menu validation problems
by Leandro Hermida 28 Dec '10

28 Dec '10

Happy holidays, Apologies for a simple question, I cannot find seem to find an answer looking through the docs and Galaxy tool xml files in the distro. Is there a way to have a select menu where I don't want there to be a default selection (when the form loads it is empty) yet the user must pick something otherwise one of those nice validator red message comes up telling you to pick something? For example: <param name="organism" type="select" force_select="true" label="Organism"> <option></option> <option value="Homo_sapiens">Homo sapiens</option> <option value="Mus_musculus">Mus musculus</option> <option value="Rattus_norvegicus">Rattus norvegicus</option> </param> If I use the validator type="empty_field" or "no_options" it doesn't work, the user can hit execute with an empty select and it tries to execute. best, Leandro

2 2

conditional with boolean parameter issue
by Leandro Hermida 27 Dec '10

27 Dec '10

Hi there, Don't know if my approach is wrong, in one of my tools I tried to create a conditional with a boolean parameter and I cannot get it to work. E.g. <conditional name="testCond"> <param name="input1" type="boolean" checked="false" truevalue="dothis" falsevalue="dothat" label="Test Checkbox"/> <when value="dothis"/> <when value="dothat"> <param name="input2" type="text" size="100" label="Test"/> </when> </conditional> The form throws and error with: Exception: ('No case matched value:', 'testCond', False) Even if I try to remove truevalue and falsevalue from the param and change the when value="True" and when value="False" I still get the same error. best, Leandro

2 1

Connecting Galaxy to a remote cluster?
by Mattias de Hollander 27 Dec '10

27 Dec '10

Hi, Is it possible to configure Galaxy so it uses a remote cluster? In our setup it is not possible to install the Galaxy frontend on the cluster head/main node. I would like to know if you can connect to a compute cluster on a remote network from the main Galaxy application. Thanks in advance! Mattias

3 2

Setting up microbial_data.loc given a mirror of NCBI FTP site
by Peter 23 Dec '10

23 Dec '10

Hi all, I'd like to be able to use the "Get Microbial Data" tool in our local Galaxy install, which appears as though it could allow access to a local copy of the NCBI "Bacteria" FTP site, ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ >From looking at the tool's source code, I see I must populate microbial_data.loc file, however the microbial_data.loc.sample is not very helpful: #This is a sample file distributed with Galaxy that enables tools #to retrieve microbial data via a URL # #... What this doesn't tell me is the meaning of the columns. Apparently this is really three tables in one, determined by the first entry. ORG entries are used by this tool for the selection of the kingdom and species. They appear to have the following columns, one per species: 0. The "ORG" column itself, not counted in the XML offsets 1. Identifier 2. Species 3. Kingdom 4. Group 5. Comma separated list of chromosomes/plasmids 6. URL for NCBI genome project The CHR entries don't seem to be used directly by this tool. There is one entry per chromosome/plasmid. 0. The "CHR" entry, not counted in the XML offsets 1. Identifier 2. Description including species and chromosome/plasmid 4. Length of sequence (nucleotides) 5. GI number 6. None 7. URL for NCBI nucleotide database Then there are the DATA entries, which appear to reference local files. There are multiple DATA entries per CHR entry: 0. The "DATA" entry, not counted in the XML offsets 1. Identifier (composite of ORG id, CHR id, and data type) 2. Identifier of ORG line 3. Identifier of CHR line 4. Data type (CDS, tRNA, rRNA, sequence, GeneMark, Glimmer3) 5. File format (fasta or bed) 6. Filename Want I want to do is generate a microbial_data.loc file from a local mirror of ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ In addition to understanding the loc file format, it also seems I need to generate some bed files from the NCBI provided data, e.g. for NC_008265 which is one of the examples in the sample loc files, I'd need the following files: NC_008265.CDS.bed NC_008265.tRNA.bed NC_008265.rRNA.bed NC_008265.fna NC_008265.GeneMark.bed NC_008265.GeneMarkHMM.bed NC_008265.Glimmer3.bed Referring to the NCBI FTP site for this organism, we have: NC_008265.GeneMark-2.5m NC_008265.GeneMarkHMM-2.6r NC_008265.Glimmer3 NC_008265.Prodigal-2.50 NC_008265.asn NC_008265.faa NC_008265.ffn NC_008265.fna NC_008265.frn NC_008265.gbk NC_008265.gff NC_008265.ptt NC_008265.rnt NC_008265.rpt NC_008265.val See ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Clostridium_perfringens_SM101_uid58… I can see for example how to map *.ptt (protein tables) into *.CDS.bed, and similarly for the Glimmer3 and GeneMark predictions. I could also probably parse *.gbk to generate bed tabular files for any annotated tRNA and rRNA entries (and the CDS entries of course.). But rather than reinventing the wheel, how do you do this at Penn State? Also, I'd like to offer access to the chromosome, CDS, tRNA, and rRNA sequences themselves (as FASTA files, not just bed tabular). Am I right that currently the "Get Microbial Data" tool doesn't offer this? Thanks, Peter

4 9

Using input parameter's file format in tool XML
by Peter 23 Dec '10

23 Dec '10

Hi all, I've noticed from a couple of examples that when defining the command line string in an tool's XML wrapper (i.e. cheetah template), you can use $input_param_name.extension (or apparently $input_param_name.ext) to get the file format (metadata) for an input parameter called input_param_name (while of course you use $input_param_name to get the filename for this param). This doesn't seem to be documented on the tool conf wiki: http://bitbucket.org/galaxy/galaxy-central/wiki/ToolConfigSyntax Most of the examples I found looking at the tool XML use the longer form (.extension), although interestingly in a recent commit Dan used just .ext in the fastx_clipper.xml file to special case Sanger FASTQ. Is that a valid alternative too? https://bitbucket.org/galaxy/galaxy-central/changeset/93d7007bd859 I'm curious about the naming here - did Galaxy once use the format name as the actual file extension? Now at least all data files seem to have .dat as the extension (which I'm sure presents a small problem for some command line tools which make file format inferences from the filename extension, requiring hacks in their Galaxy wrappers). Peter

2 3

Keeping track of what you did.
by Anthony Ferrari 23 Dec '10

23 Dec '10

Hi all, I am currently setting up a Galaxy instance on a cluster. My team is intended to analyze data from NGS experiments. Our different pipelines/workflows will be integrated into Galaxy. It might be a simple question but I don't find useful information on this point. When looking back on a file result, a crucial need for us is to be able to know : - who ? (generated this file) - when ? (precise date and time) - what ? (applied workflow) - how ? (applied parameters in each workflow 'box') - and stuff like computational time & others if possible. Is there something allowing to retrieve these informations in galaxy ? Logfiles, a table in the database that we could interrogate ? As far as I currently understand, histories are managed on a per user basis, allowing one to save and share histories. Does it enable one to reproduce exactly the same pipeline without having to re-specify parameters used with new input data ('applying a workflow instance') ? Workflows may indeed be modified/improved frequently. We have to keep memory of which 'version' was used to generate our results. And where are stored the output files ? Is it possible to force galaxy to put a file result in pre-specified place ? It seems like once you have given galaxy the input files, you do not control where the output files will be stored (internal management) and all you have is a link to this output in your history panel. For the input files, I saw that there is a way to avoid Galaxy to duplicate input files when uploading them and so keep your tree structure, something which is especially useful when dealing with huge NGS data. Does exist this kind of flexibility for output files ? Best regards, A.

2 1

sample form creation, nglims, seq tracking
by Collins, Mike 21 Dec '10

21 Dec '10

I'm using the nglims build via "hg clone http://www.bx.psu.edu/hg/galaxy galaxy_dist" as mentioned in the wiki, Apache proxy and Postgres DB enabled. Please correct me if any of my initial "goals" below are unattainable or if I am misinterpreting Galaxy's capabilities. I'm still attempting to learn the overall system capabilities of Galaxy and its integration between nglims samples+projects and the native Sample/Seq tracking components. Goals I'm trying to accomplish: (1) Create a fresh custom sample form in the Admin tab's "Manage Form Definitions" to use in the Lab tab's (nglims section) "next gen sample submission" process, and assign this custom sample form to be used as the "default" form when creating a new sample in nglim section's "next gen sample submission", (2) Allow multiple/various defined samples created with this custom sample submission form to be submitted in the Lab tab's "Submit samples as a project" as a project for tracking in the Admin tab section, (3) Allow "Find Samples" in the Admin tab to potentially query for nglims samples created with this sample form. Uncertainties: (1) Can samples created in the nglims section currently be tracked in the Admin tab's "Sample Tracking" area, or does the sample tracking functionality only allow tracking of the samples created with the native form definitions (within the Admin section, outside of the Lab tab "next gen sample submission")? There was a prior posting a while back that mentioning the following (verbatim): "You probably don't want to mix the "nglims" approach with Galaxy's native sample tracking functionality. While I've build the nglims part to use the same database tables and be fairly interoperable, it does take some different data representation choices, especially with regards to the sample/request relationship." Does this mean we're limited to creating samples outside of the nglims section if we want to track them within the native sequencing section? Secondly, will there be future support for tracking nglims-created samples with the native sample/seq tracking? (2) How to change which sample form is used in the Lab tab's "Define samples and services". In fact, I cannot determine what form is actually used which corresponds to the form which appears when one clicks the Lab tab's "next gen sample submission". Additional concern: When creating and/or editing a form in the Admin tab's "manage form definitions", I've created a sample submission form with multiple grids, some grids which have multiple fields beneath. When editing this form and assigning fields to the respective grids, each time I save and re-edit the form, all of the fields get reset to whatever is listed as the 1st grid in the "Select the grid layout to place this field" dropdown menu, rendering all of my assigned fields to be reset to the first grid. Adding or removing a grid with the submit button also does the same thing. Any thoughts? Galaxy appears to be a very promising product and we're looking forward to having the researchers and sequence team use this environment, once we finalize the configuration and push into production. I'm sure I'll have additional questions once I work through some of these current issues. My apologies for the extended length of this post and if I've somehow misinterpreted some of the functionalities and compatibilities. Thanks in advance for your feedback. Best Regards, Mike

3 4

FormValues content upgrade script
by Brad Chapman 20 Dec '10

20 Dec '10

Greg and rc; I've been trying to keep up with the latest changes in the FormValues table, and am running into one issue with the upgrade script: https://bitbucket.org/galaxy/galaxy-central/src/714afada18de/lib/galaxy/mod… During the conversion of the field from a list to dictionary, this assumes all of the values are strings, and strips out any quote characters. I'm using this table to store more complex objects like dictionaries, and they are disrupted by that change. Is this step necessary? The to_json_string call should quote any of these characters appropriately. This patch gets rid of the quote stripping so should be safe to make the conversion without changing content: https://bitbucket.org/chapmanb/galaxy-central/changeset/920527658e04 Thanks, Brad

1 0

nextgen snp calling with dbSNP rs#s
by Mike Cariaso 18 Dec '10

18 Dec '10

Does your SNP pipeline recognize and assign dbSNP rs#### identifiers? I'm interested coding better integration between galaxy and SNPedia annotations of the human genome http://www.SNPedia.com Probably this should be done as a sort of https://bitbucket.org/galaxy/galaxy-central/wiki/DataSources I'm hoping to ask a few pipeline related questions to anyone who's pipelines assign rs#s to nextgen data. -- -- Mike Cariaso http://www.cariaso.com

2 2

how do I migrate accounts from one galaxy to another
by Dean Snyder 17 Dec '10

17 Dec '10

I'm sure this has been covered somewhere but I haven't been able to find it on the wiki or by searching the list archives. Can I, and if I can, how do I, migrate all user accounts from a previous local Galaxy development install to a new production one (a brand new install in a separate location on the same machine), preserving account info, workflows, histories, data, etc.? If this can be done, can it be done selectively, i.e., for only specified users? Thanks for any help offered, Dean A. Snyder Senior Programmer/Analyst Center for Inherited Disease Research (CIDR) Johns Hopkins School of Medicine Bayview Research Campus 333 Cassell Dr, Triad Bldg, Suite 2000 Baltimore, MD 21224 cell:717 668-3048 office:410-550-4629 www.cidr.jhmi.edu

2 3