January 2011 - galaxy-user - lists.galaxyproject.org

Re: [galaxy-user] Adding the Hydra genome to Galaxy
by Jennifer Jackson 06 Apr '11

06 Apr '11

Hello Rob, We will add this to our to-do list for new genomes. Thanks for sending the Genbank information! Next time, if you could send requests to galaxy-user, that would be very helpful for the team. Best, Jen Galaxy team On 1/3/11 12:37 PM, Rob Steele wrote: > Hi Jennifer, > Would it be possible to get the Hydra genome assembly added to Galaxy? > It has been published and is available in GenBank under accession number > ABRM00000000. > > Cheers, > Rob > > Rob Steele, Ph.D. > Professor > D240 Medical Sciences I > Department of Biological Chemistry > School of Medicine > University of California, Irvine > Irvine, CA 92697-1700 > > phone: 949-824-7341 > e-mail: resteele(a)uci.edu > fax: 949-824-2688 > web: http://polyp.biochem.uci.edu/wiki/index.php/Main_Page > -- Jennifer Jackson http://usegalaxy.org

3 8

Peep view for history elements broken on IE?
by Peter 22 Feb '11

22 Feb '11

Hi all, I've recently been testing Galaxy on Microsoft Internet Explorer, IE6 and now IE7. It seems that the "peep" view for history entries isn't supported. The history elements' names are not links, so clicking on them does not make them expand to show the info (e.g. number of sequences of file size, and the start of the data). This happens both on the public Galaxy instance at Penn State (http://usegalaxy.org) and our local Galaxy instance. Is this a known issue? Peter

3 13

fastx_clipper.xml and FASTQ Illumina 1.3+?
by Peter 22 Feb '11

22 Feb '11

Hi all, Should the fastx_clipper.xml allow Illumina 1.3+ FASTQ as well as Sanger and Solexa FASTQ (and FASTA)? i.e. replace this: <param format="fasta,fastqsolexa,fastqsanger" name="input" type="data" label="Library to clip" /> with this: <param format="fasta,fastqsolexa,fastqillumina,fastqsanger" name="input" type="data" label="Library to clip" /> Peter

2 1

Gene Name in Cufflink/compare/diff
by Matteo Bovolenta 15 Feb '11

15 Feb '11

Hi all, when I run a RNASeq analysis using tophat, cufflink, coffcompare and cuffdiff by aligning my data to the RefSeq genes I obtain tables from cufflink/compare/diff which does not include the gene name, but only the NM_. Does someone knows how I can obtain all the tables with the gene name? Thank you all very much for the support, Best Regards, Matteo -- Matteo Bovolenta, PhD Dipartimento di Medicina Sperimentale e Diagnostica Sezione di Genetica Medica Università di Ferrara Via Fossato di Mortara, 74 44100 Ferrara tel +39 0532 974449(office) tel +39 0532 974502 (lab) fax +39 0532 236157 email bvlmtt(a)unife.it http://www.unife.it/medicina/geneticamedica http://www.bio-nmd.eu registered in ORPHANET http://www.orpha.net NOTA DI RISERVATEZZA: ai sensi del D.Lgs. 196/2003 si precisa che le informazioni contenute in questo messaggio e nei relativi allegati sono riservate ed a uso esclusivo del destinatario. Qualora il messaggio in parola Le fosse pervenuto per errore, La invitiamo ad eliminarlo senza copiarlo, a non inoltrarlo a terzi e a non farne alcun uso, dando gentilmente comunicazione all'indirizzo del mittente: bvlmtt(a)unife.it Grazie. CONFIDENTIALITY NOTICE: this message together with its annexes may contain confidential, proprietary or legally privileged information and is intended only for the use of the addressee named above. No confidentiality or privilege is waived or lost by any mistransmission. If you are not the intended recipient of this message you are hereby notified that you must not use, disseminate, copy it in any form or take any action in reliance on it. If you have received this message in error please delete it and any copies of it and kindly inform the sender of this e-mail by bvlmtt(a)unife.it Thank you

6 6

Extract sequences from [gtf file] + [genome FASTA file]
by Karen Tang 10 Feb '11

10 Feb '11

Hi Galaxy people, I have transcripts predicted by Cufflinks that are in a gtf file. How can I extract the sequences corresponding to those transcripts, using Galaxy? [Cufflinks transcript predictions in gtf file] + [Genome sequence in FASTA file] ---> [FASTA file of transcript sequences] My genome is a custom genome (not at UCSC). --------- I'll also need to do the same thing, except my predicted transcripts are in a Scripture bed file. Thanks for your help! Karen Tang :) Plant Biology University of Minnesota

3 3

Galaxy for gene expression comparison
by Martin, David A. 08 Feb '11

08 Feb '11

Hello, I am comparing RNA expression in two groups of rats, a drug treated group against a control group. There are 10 biological replicates in each group. I am unsure of how to flow this analysis through Galaxy using Tophat followed by Cufflinks/compare/diff. Should the files for each group be merged at any point? I would think they should be kept separate in order to properly account for the spread across animals. I am just a little unsure of how to group the files on galaxy, and where to differentiate biological and technical replicates. On a different note, is there a way to control the bowtie mapping parameters more closely when using tophat? Thank you for any kind of knowledge on these matters! -David Martin

5 9

Python error when running Bowtie for Illumina
by Weng Khong Lim 01 Feb '11

01 Feb '11

Hi all, I'm new to next-gen sequencing, so please be gentle. I've just received a pair of Illumina FASTQ files from the sequencing facility and intend to map them to the hg19 reference genome. I first used the FASTQ Groomer utility to convert the reads into Sanger reads. However, when running Bowtie for Illumina on the resulting dataset under default settings, I received the following error: An error occurred running this job: *Error aligning sequence. requested number of bytes is more than a Python string can hold* * * Can someone help point out my mistake? My history is accessible at http://main.g2.bx.psu.edu/u/wengkhong_lim/h/chip-seq-pilot-batch Appreciate the help! Weng Khong, LIM Department of Genetics University of Cambridge E-mail: wkl24(a)cam.ac.uk Tel: +447503225832

4 3

January 31, 2011 Galaxy Development News Brief
by Jennifer Jackson 31 Jan '11

31 Jan '11

January 31, 2011 Galaxy Development News Brief http://bitbucket.org/galaxy/galaxy-central/wiki/Features/DevNewsBrief/2011_… ---- Get Galaxy! http://bitbucket.org/galaxy/galaxy-central/wiki/GetGalaxy * new: % hg clone http://www.bx.psu.edu/hg/galaxy galaxy-dist * upgrade: % hg pull -u -r 95d65755ac69 ---- What's New = Workflow Additions= 1) Usability improvements for workflow annotations * Workflow annotation is now shown at the top of the page. * Step annotations are shown in the step header rather than at the bottom. example: Annotation http://bitbucket.org/galaxy/galaxy-central/wiki/Features/DevNewsBrief/2011_… 2) Easier to move workflows directly from one Galaxy instance to another * Workflow download/export page now provides URL that can be used to directly import a workflow from one instance to another. example: URL import http://bitbucket.org/galaxy/galaxy-central/wiki/Features/DevNewsBrief/2011_… 3) New Parameter settings for global application or specific actions * Workflow parameters are a new feature we've added to simplify reuse of workflows, and to allow for easier variation of parameters when re-running a workflow. * Instead of filling in explicit values when building a workflow, you can now use flexible parameters. To specify a workflow parameter, simply use a tag like ${my_variable_name} in any tool input field or in a rename dataset action field. * The workflow shown below has two parameters, as shown in the Workflow Parameters display in the top right of the editor window. You can see the ${filter_condition} parameter in the right panel in both the tool input and the rename action. ** Note that while this ${filter_condition} is only used in a single step in this simple demo workflow, variables can be used across steps. example: Parameters http://bitbucket.org/galaxy/galaxy-central/wiki/Features/DevNewsBrief/2011_… *As the inputs are filled in the Workflow Parameters box, seen in the runtime example below, the new values will be reflected in all workflow steps and will be used when the workflow is executed. example: Runtime display http://bitbucket.org/galaxy/galaxy-central/wiki/Features/DevNewsBrief/2011_… 4) General workflow tuning * HideDataset Action will no longer show in the workflow editor. The ability to manually create one of these actions has been deprecated in favor of the workflow outputs approach. * Workflow run results can now be sent to a new history instead of the current one. * Ordering of workflow steps is now sorted based on the layout in the workflow editor, arranged based on distance from top left corner of the editor. This won't affect existing workflows until re-saved. * Workflows that contain steps expecting tools that are unavailable (as might be the case for a workflow imported from another Galaxy instance) will now have problem nodes marked with an error state. The workflow cannot be saved until the steps are removed or the tools are added to the current Galaxy instance. = Deferred Jobs & Managed Transfers = These components are under rapid development and interfaces should be considered experimental. They can be enabled by setting 'enable_beta_job_managers = True' in universe_wsgi.ini. 1) Deferred Jobs * A generic method for creating a dependency on an event before executing arbitrary code has now been defined in: galaxy-dist/lib/galaxy/job/deferred/__init__.py * The deferred job runner loads plugins found in the same directory which implement the necessary methods check_job() and run_job(). check_job() returns a state which informs the deferred job runner whether it is okay to execute the run_job() method. * The deferred job runner is independent from the regular tool-related job runner and is not coupled with tools, nor does it have any integrated cluster support. * No documentation is provided for the format of a plugin at this time, but a sample plugin will be included at a later date. 2) Transfer Manager * Galaxy can now spawn persistent transfers of (unauthenticated) http and https URLs via the code in: galaxy-dist/lib/galaxy/job/transfer_manager.py * The transfer manager is accessible in-application at app.transfer_manager. * Transfers are daemonized and thus not influenced by Galaxy restarts, although a loss of database connection (restarting the database server) will cause transfers to fail. * A transfer can be polled for progress via JSON-RPC requests to its socket. An interface for this request is available in the transfer_manager class. * Future enhancements will allow for authenticated http/https and scp. ---- Updated & Improved = Data Content = * New "Get Data" source: modENCODE modMine server. * New Genomes: format: Genome [dbkey] ** //Pseudomonas aeruginos (str. PA7) [16720] ** Pseudomonas aeruginos (str. PAO1) [pseuAeru] ** Pseudomonas aeruginos (str. UCBPP-PA14) [386] ** Silkworm (Bombyx mori str. p50T) [Bombyx_mori_p50T_2.0] ** Rice pathogen: Xanthomonas oryzae (str. KACC10331) [12931] ** Rice pathogen: Xanthomonas oryzae (str. MAFF 311018) [16297] ** Rice pathogen: Xanthomonas oryzae (str. PXO99A) [Xanthomonas_oryzae_PXO99A] ** Little Brown Bat (Myotis lucifugus) [myoluc2] = Current Tools = * Optional 'min' and 'max' attributes for tool integer and float parameters. * Add --max and --un options to Bowtie. * Validators enforce min/max values and generated error messages for invalid values. * Add Standard deviation operation to grouping tool. * Enhance histogram tool to allow plotting as frequency/counts. * Allow fastx_toolkit clipper wrapper to work on fastqsanger formatted files. * SAM metadata detection and setting sped up dramatically upon import. * Wig-to-bigWig converter tool default parameters now same as UCSC's. * Cufflinks tool suite updates: ** Update Cufflinks tool suite wrappers to support v0.9.3. ** Add support for bias correction to Cufflinks and Cuffdiff; bias correction improves transcript quantitation results (FPKM values). ** Enable Cuffcompare to use sequence data so that it can generate data for use by Cuffdiff. Cuffcompare uses sam_fa_indices.loc to find locally cached genome sequence data and indices that are needed for this option. ** Add normalization support to Cufflinks and Cuffdiff and replicate support for cuffdiff. = New Tools = * BED-to-bigBed converter tool now under "Convert Formats". ** Converts sorted BED files into UCSC's bigBed format. ** Requires bedToBigBed in PATH. * BLAST+ tools are now commented in tool_conf.xml.sample. ** Not in Galaxy main, for local instance use. ** If you run the BLAST+ tools at your site, please be sure to uncomment them if updating to the latest tool_conf.xml.sample (will be default at next update). * Add SAMtools flagstat tool. * Add CCAT ChIP-seq peak/region caller. * Add BWA wrapper for SOLiD. = Histories = * Estimated size is now displayed for very large text-based (non-binary) datasets. * Always show Galaxy masthead and enable Saved Histories to work with and without panels. * New "Copy Datasets" link added under History Options dropdown. ** Dataset and history IDs are now encoded. ** JS dropdown used to change source history, so you no longer have to switch desired source history to be "active". ** Remove link in "edit attributes" of datasets. ** Simplified destination interface by using a single select box by default, but also providing a link to show checkboxes for multiple destination histories. ** Use newly imported Inflector library to get correct plural/singular nouns on actions. ** Added arrow between source and destination. ** Removed checkbox for copying to a new history. Instead, create new history if new history field is not blank. = Data Libraries= * Creating job information (stdout/stderr) is now available on the library item info page, which is helpful when library uploads fail. = Trackster = * Add bigWig display to trackster. Automatically converts wig to bigwig if needed (NOTE: datatypes_conf.xml.sample has been edited to add the new converter, you must update datatypes_conf.xml to use it). The converter requires that wigToBigWig be in the PATH, but no other tools are needed to view bigwig files as they are provided by bx_python. * Tuning ** Fix track preferences not being applied ** Fix chroms not being selectable when a new track browser is created ** Fix ReferenceTrack not working with filters ** Fix visual analytics error when tool configuration has changed ** Fix visualization saving on Chrome by using $.each instead of for loop ** Fix shared visualizations ** Fix tracks fetching from data provider when indexer returned None, and when ** Fix BAM reads without cigar string. = Sample Tracking = * Additions and tuning to improve tracking (complete documentation will be available soon). = User Interface (UI) = *Version info is now printed in history item for Bowtie, BWA, Lastz, TopHat, Samtools, Cuffdif, Cufflinks, Cuffcompare, BFAST, and PerM. * Turn off web browser auto-complete for tool search (includes workflows). * Grid changes resulting in better readability: ** Better page number display. ** Use "~" instead of "about" example: "about 2 hours ago" -> "~ 2 hours ago". ** Cell padding decreased. ** Added new "nowrap" parameter to prevent text from being wrapped. Currently only used for tags to prevent "X tags" from wrapping in the middle. = Application Programming Interface (API) = * Still 'alpha', but: Initial pass completed for forms, request_type, users and roles. See README and examples in source. = Source = * Galaxy now runs with system python on 64-bit mac kernel. * Enhance select parameter wrapper objects to provide access to additional fields by name for dynamically generated select lists (i.e. dynamic_options). example: use ${param.fields.path} to access a path field * Updated the XML in filter specification for output files. The closing filter tag can now be on separate line to use as an # actual filter. * Implemented 'from_work_dir' attribute for tool outputs. Using this attribute matches a file in the working directory to a tool output/HDA; when a tool finishes, the file in the working directory is automatically copied to the HDA. Hence, it's no longer necessary for tool wrapper scripts to manually copy tool output files to HDA files. * More programmatic control of page numbers on grid (custom UI). * Many formerly undocumented options have been added to the universe_wsgi.ini.sample file. Please compare your working copy with the new .sample and determine whether any of the options are relevant to your environment. *The "welcome page" found at static/welcome.html, has been renamed to static/welcome.html.sample. * The UCSC Genome Browser now supports loading data via https. If you implemented this former Galaxy URLs from https->http method in Apache as per the wiki documentation you will also need to remove it there, and allow the browser through your authentication scheme over https. * Explicitly convert autocomplete dropdown values on refresh_on_change. * Adjust image links in tools to work with a proxy prefix (thanks Brad Chapman). = Bug Fixes = * Ensembl GTF files are recognized correctly. * Make Tophat wrapper compatible with python 2.4 by removing try-catch-finally. * Fix unicode error for dataset peeks. * Sanitize tool links in tool menu so that they can be searched. Use only lowercase letters, numbers, and underscores in links to ensure cross-browser functionality. * Fix for building form element for boolean tool parameter when default state is configured as checked in tool configuration, but user provided value is non-checked. Fixes an issue seen in workflows that prevents saving the unchecked state under described conditions. * Fix typo in BFAST. * Trackster: fix BAM display bugs, Dense display bug, better dataset selection. * Add validator to Tophat's segment mismatches parameter. * Turn off web browser autocomplete for tool search in workflows. * Fix Trackster's BAM display bugs, Dense display bug, better dataset selection. * Add validator to Tophat's segment mismatches parameter. * Added the ctypes egg so the DRMAA egg will now work under Python 2.4. * Bug fixes to the Mac OS X launcher in the galaxy-dist/contrib/ folder (from Florent Angly). * Python reports the wrong platform when running a 32-bit Python on 64-bit Linux, so Galaxy now forces the correct platform in this environment (thanks David Hoover). * Fix Chrome not auto-saving changes to workflow checkboxes. ---- About Galaxy Galaxy is supported in part by NSF, NHGRI, the Huck Institutes of the Life Sciences, and The Institute for CyberScience at Penn State. Core Team http://bitbucket.org/galaxy/galaxy-central/wiki/GalaxyTeam Use Galaxy! http://usegalaxy.org GalaxyProject.org http://galaxyproject.org|GalaxyProject.org Development Home http://bitbucket.org/galaxy/galaxy-central ---- Galaxy Team January 31, 2011

1 0

does Galaxy record tool versions?
by Yury Bukhman 28 Jan '11

28 Jan '11

Hi, in order to reproduce an analysis, it's good to know not only what tools were used, but also their versions. Is there a way to figure that out from a Galaxy history? I would like to be able to answer questions like "what version of bowtie have I run in an analysis performed 6 months ago?" Thanks. Yury -- Yury V. Bukhman, Ph.D. Associate Scientist, Bioinformatics Great Lakes Bioenergy Research Center University of Wisconsin - Madison 445 Henry Mall, Rm. 513 Madison, WI 53706, USA Phone: 608-890-2680 Fax: 608-890-2427 Email: ybukhman(a)glbrc.wisc.edu

3 4

Re: [galaxy-user] Extract sequences from [gtf file] + [genome FASTA file]
by Jennifer Jackson 28 Jan '11

28 Jan '11

-------- Original Message -------- Subject: Re: [galaxy-user] Extract sequences from [gtf file] + [genome FASTA file] Date: Thu, 27 Jan 2011 17:23:11 -0700 From: Brian Foley PhD <btf(a)lanl.gov> To: Jennifer Jackson <jen(a)bx.psu.edu> Dear Jen, I am not much of a Galaxy user yet, but a long time user of GenBank and other databases and sequence analysis tools (Phylogenetics software, etc). A common task I would like to do, is obtain a FASTA format file (ideally aligned, but I can do the alignment later very easily) of the regions of sequences hit in a BLAST search on GenBank. It is easy to ask GenBank to give me all (or the selected few) sequences hit in the BLAST search, but not so easy to get each sequence "clipped" to the matched region. For example, if I search with the D-loop region of a mammal mitochondrial genome, I would like to get that region clipped out of all the hundreds of complete mitochondrial genomes. Or if I search with a mammalian endogenous retrovirus, get the retroviruses clipped from the complete chromosome entries. Ideally, I would add one more criteria. I would add that I would like to be able to get some number of bases (lets say 100) flanking the matched region. So I could capture the integration sites of endogenous retroviruses, for example. Or get the intron flanks of a gene if I was searching with a mammalian gene exon. The final thing would be to deal with the fact that GenBank BLAST match results often get fragmented. For example the LTRs of retroviruses (endogenous or not) create a problem. And any large in/dels or highly variable regions often split one contiguous homologous string into two individual matches split at the in/del or variable site. This looks somewhat similar to the task you describe below, so I am wondering if it is something I can do in Galaxy (or with Galaxy plus a few other tools). GenBank/BLAST will almost give me what I want. The trouble I find is that either I can get the result as a multiple sequence alignment but with useless sequence names (just the gi number for identifier) and not in FASTA format, or I can get full sequence entries but not the matched region clipped out. I have asked NCBI/GenBank if they would serve up the results in FASTA format, but they are not responsive on that. Brian Foley PhD HIV Databases btf(a)lanl.gov http://www.hiv.lanl.gov On 1/27/11 1:36 PM, "Jennifer Jackson" <jen(a)bx.psu.edu> wrote: > Hello Karen, > > The following general workflow should help you to pull sequences from > any source. > > 1) cut out the sequence IDs from the query (in this case, a GTF & BED > file) and sort them. > Text Manipulation -> Cut columns from a table > Filter and Sort -> Sort > 2) convert the target fasta file to tabular format > Convert Formats -> FASTA-to-Tabular converter > 3) join the two datasets based on the sequence ID > Join, Subtract and Group -> Join two Queries > 4) covert to fasta > Convert Formats -> Tabular-to-FASTA > 5) when starting with a GTF file, there will most likely be duplicates. > To remove, use: > NGS: QC and manipulation -> Collapse sequences > > Once you create the actual workflow that performs the job, be sure to > save it so that you can just re-use it whenever you need to perform the > same task. To do this, from the history pane (most right) use Options -> > Extract workflow and following the instructions on the form to customize. > > Hopefully this helps, > > Jen > Galaxy team

1 0