Re: [galaxy-user] Adding the Hydra genome to Galaxy
by Jennifer Jackson
Hello Rob,
We will add this to our to-do list for new genomes. Thanks for sending
the Genbank information!
Next time, if you could send requests to galaxy-user, that would be very
helpful for the team.
Best,
Jen
Galaxy team
On 1/3/11 12:37 PM, Rob Steele wrote:
> Hi Jennifer,
> Would it be possible to get the Hydra genome assembly added to Galaxy?
> It has been published and is available in GenBank under accession number
> ABRM00000000.
>
> Cheers,
> Rob
>
> Rob Steele, Ph.D.
> Professor
> D240 Medical Sciences I
> Department of Biological Chemistry
> School of Medicine
> University of California, Irvine
> Irvine, CA 92697-1700
>
> phone: 949-824-7341
> e-mail: resteele(a)uci.edu
> fax: 949-824-2688
> web: http://polyp.biochem.uci.edu/wiki/index.php/Main_Page
>
--
Jennifer Jackson
http://usegalaxy.org
11 years, 3 months
Peep view for history elements broken on IE?
by Peter
Hi all,
I've recently been testing Galaxy on Microsoft Internet Explorer,
IE6 and now IE7. It seems that the "peep" view for history entries
isn't supported. The history elements' names are not links, so
clicking on them does not make them expand to show the info
(e.g. number of sequences of file size, and the start of the data).
This happens both on the public Galaxy instance at Penn State
(http://usegalaxy.org) and our local Galaxy instance.
Is this a known issue?
Peter
11 years, 4 months
fastx_clipper.xml and FASTQ Illumina 1.3+?
by Peter
Hi all,
Should the fastx_clipper.xml allow Illumina 1.3+ FASTQ as well as
Sanger and Solexa FASTQ (and FASTA)?
i.e. replace this:
<param format="fasta,fastqsolexa,fastqsanger" name="input" type="data"
label="Library to clip" />
with this:
<param format="fasta,fastqsolexa,fastqillumina,fastqsanger"
name="input" type="data" label="Library to clip" />
Peter
11 years, 4 months
Gene Name in Cufflink/compare/diff
by Matteo Bovolenta
Hi all,
when I run a RNASeq analysis using tophat, cufflink, coffcompare and
cuffdiff by aligning my data to the RefSeq genes I obtain tables from
cufflink/compare/diff which does not include the gene name, but only
the NM_.
Does someone knows how I can obtain all the tables with the gene name?
Thank you all very much for the support,
Best Regards,
Matteo
--
Matteo Bovolenta, PhD
Dipartimento di Medicina Sperimentale e Diagnostica
Sezione di Genetica Medica
Università di Ferrara
Via Fossato di Mortara, 74
44100 Ferrara
tel +39 0532 974449(office)
tel +39 0532 974502 (lab)
fax +39 0532 236157
email bvlmtt(a)unife.it
http://www.unife.it/medicina/geneticamedica
http://www.bio-nmd.eu
registered in ORPHANET
http://www.orpha.net
NOTA DI RISERVATEZZA: ai sensi del D.Lgs. 196/2003 si precisa che le
informazioni contenute in questo messaggio e nei relativi allegati
sono riservate ed a uso esclusivo del destinatario. Qualora il
messaggio in parola Le fosse pervenuto per errore, La invitiamo ad
eliminarlo senza copiarlo, a non inoltrarlo a terzi e a non farne
alcun uso, dando gentilmente comunicazione all'indirizzo del mittente:
bvlmtt(a)unife.it Grazie.
CONFIDENTIALITY NOTICE: this message together with its annexes may
contain confidential, proprietary or legally privileged information
and is intended only for the use of the addressee named above. No
confidentiality or privilege is waived or lost by any mistransmission.
If you are not the intended recipient of this message you are hereby
notified that you must not use, disseminate, copy it in any form or
take any action in reliance on it. If you have received this message
in error please delete it and any copies of it and kindly inform the
sender of this e-mail by bvlmtt(a)unife.it Thank you
11 years, 4 months
Extract sequences from [gtf file] + [genome FASTA file]
by Karen Tang
Hi Galaxy people,
I have transcripts predicted by Cufflinks that are in a gtf
file. How can I extract the sequences corresponding to those
transcripts, using Galaxy?
[Cufflinks transcript predictions in gtf file] + [Genome sequence in FASTA file] ---> [FASTA file of transcript sequences]
My genome is a custom genome (not at UCSC).
---------
I'll also need to do the same thing, except my predicted
transcripts are in a Scripture bed file.
Thanks for your help!
Karen Tang :)
Plant Biology
University of Minnesota
11 years, 4 months
Galaxy for gene expression comparison
by Martin, David A.
Hello,
I am comparing RNA expression in two groups of rats, a drug treated group against a control group. There are 10 biological replicates in each group. I am unsure of how to flow this analysis through Galaxy using Tophat followed by Cufflinks/compare/diff. Should the files for each group be merged at any point? I would think they should be kept separate in order to properly account for the spread across animals. I am just a little unsure of how to group the files on galaxy, and where to differentiate biological and technical replicates.
On a different note, is there a way to control the bowtie mapping parameters more closely when using tophat?
Thank you for any kind of knowledge on these matters!
-David Martin
11 years, 4 months
Python error when running Bowtie for Illumina
by Weng Khong Lim
Hi all,
I'm new to next-gen sequencing, so please be gentle. I've just received a
pair of Illumina FASTQ files from the sequencing facility and intend to map
them to the hg19 reference genome. I first used the FASTQ Groomer utility to
convert the reads into Sanger reads. However, when running Bowtie for
Illumina on the resulting dataset under default settings, I received the
following error:
An error occurred running this job: *Error aligning sequence. requested
number of bytes is more than a Python string can hold*
*
*
Can someone help point out my mistake? My history is accessible at
http://main.g2.bx.psu.edu/u/wengkhong_lim/h/chip-seq-pilot-batch
Appreciate the help!
Weng Khong, LIM
Department of Genetics
University of Cambridge
E-mail: wkl24(a)cam.ac.uk
Tel: +447503225832
11 years, 5 months
January 31, 2011 Galaxy Development News Brief
by Jennifer Jackson
January 31, 2011 Galaxy Development News Brief
http://bitbucket.org/galaxy/galaxy-central/wiki/Features/DevNewsBrief/201...
----
Get Galaxy!
http://bitbucket.org/galaxy/galaxy-central/wiki/GetGalaxy
* new: % hg clone http://www.bx.psu.edu/hg/galaxy galaxy-dist
* upgrade: % hg pull -u -r 95d65755ac69
----
What's New
= Workflow Additions=
1) Usability improvements for workflow annotations
* Workflow annotation is now shown at the top of the page.
* Step annotations are shown in the step header rather than at the bottom.
example: Annotation
http://bitbucket.org/galaxy/galaxy-central/wiki/Features/DevNewsBrief/201...
2) Easier to move workflows directly from one Galaxy instance to another
* Workflow download/export page now provides URL that can be used to
directly import a workflow from one instance to another.
example: URL import
http://bitbucket.org/galaxy/galaxy-central/wiki/Features/DevNewsBrief/201...
3) New Parameter settings for global application or specific actions
* Workflow parameters are a new feature we've added to simplify reuse of
workflows, and to allow for easier variation of parameters when
re-running a workflow.
* Instead of filling in explicit values when building a workflow, you
can now use flexible parameters. To specify a workflow parameter,
simply use a tag like ${my_variable_name} in any tool input field or in
a rename dataset action field.
* The workflow shown below has two parameters, as shown in the Workflow
Parameters display in the top right of the editor window. You can see
the ${filter_condition} parameter in the right panel in both the tool
input and the rename action.
** Note that while this ${filter_condition} is only used in a single
step in this simple demo workflow, variables can be used across steps.
example: Parameters
http://bitbucket.org/galaxy/galaxy-central/wiki/Features/DevNewsBrief/201...
*As the inputs are filled in the Workflow Parameters box, seen in the
runtime example below, the new values will be reflected in all workflow
steps and will be used when the workflow is executed.
example: Runtime display
http://bitbucket.org/galaxy/galaxy-central/wiki/Features/DevNewsBrief/201...
4) General workflow tuning
* HideDataset Action will no longer show in the workflow editor. The
ability to manually create one of these actions has been deprecated in
favor of the workflow outputs approach.
* Workflow run results can now be sent to a new history instead of the
current one.
* Ordering of workflow steps is now sorted based on the layout in the
workflow editor, arranged based on distance from top left corner of the
editor. This won't affect existing workflows until re-saved.
* Workflows that contain steps expecting tools that are unavailable (as
might be the case for a workflow imported from another Galaxy instance)
will now have problem nodes marked with an error state. The workflow
cannot be saved until the steps are removed or the tools are added to
the current Galaxy instance.
= Deferred Jobs & Managed Transfers =
These components are under rapid development and interfaces should be
considered experimental. They can be enabled by setting
'enable_beta_job_managers = True' in universe_wsgi.ini.
1) Deferred Jobs
* A generic method for creating a dependency on an event before executing
arbitrary code has now been defined in:
galaxy-dist/lib/galaxy/job/deferred/__init__.py
* The deferred job runner loads plugins found in the same directory
which implement the necessary methods check_job() and run_job().
check_job() returns a state which informs the deferred job runner
whether it is okay to execute the run_job() method.
* The deferred job runner is independent from the regular tool-related
job runner and is not coupled with tools, nor does it have any
integrated cluster support.
* No documentation is provided for the format of a plugin at this time,
but a sample plugin will be included at a later date.
2) Transfer Manager
* Galaxy can now spawn persistent transfers of (unauthenticated) http
and https URLs via the code in:
galaxy-dist/lib/galaxy/job/transfer_manager.py
* The transfer manager is accessible in-application at app.transfer_manager.
* Transfers are daemonized and thus not influenced by Galaxy restarts,
although a loss of database connection (restarting the database server)
will cause transfers to fail.
* A transfer can be polled for progress via JSON-RPC requests to its
socket. An interface for this request is available in the
transfer_manager class.
* Future enhancements will allow for authenticated http/https and scp.
----
Updated & Improved
= Data Content =
* New "Get Data" source: modENCODE modMine server.
* New Genomes: format: Genome [dbkey]
** //Pseudomonas aeruginos (str. PA7) [16720]
** Pseudomonas aeruginos (str. PAO1) [pseuAeru]
** Pseudomonas aeruginos (str. UCBPP-PA14) [386]
** Silkworm (Bombyx mori str. p50T) [Bombyx_mori_p50T_2.0]
** Rice pathogen: Xanthomonas oryzae (str. KACC10331) [12931]
** Rice pathogen: Xanthomonas oryzae (str. MAFF 311018) [16297]
** Rice pathogen: Xanthomonas oryzae (str. PXO99A)
[Xanthomonas_oryzae_PXO99A]
** Little Brown Bat (Myotis lucifugus) [myoluc2]
= Current Tools =
* Optional 'min' and 'max' attributes for tool integer and float parameters.
* Add --max and --un options to Bowtie.
* Validators enforce min/max values and generated error messages for
invalid values.
* Add Standard deviation operation to grouping tool.
* Enhance histogram tool to allow plotting as frequency/counts.
* Allow fastx_toolkit clipper wrapper to work on fastqsanger formatted
files.
* SAM metadata detection and setting sped up dramatically upon import.
* Wig-to-bigWig converter tool default parameters now same as UCSC's.
* Cufflinks tool suite updates:
** Update Cufflinks tool suite wrappers to support v0.9.3.
** Add support for bias correction to Cufflinks and Cuffdiff; bias
correction improves transcript quantitation results (FPKM values).
** Enable Cuffcompare to use sequence data so that it can generate data
for use by Cuffdiff. Cuffcompare uses sam_fa_indices.loc to find locally
cached genome sequence data and indices that are needed for this option.
** Add normalization support to Cufflinks and Cuffdiff and replicate
support for cuffdiff.
= New Tools =
* BED-to-bigBed converter tool now under "Convert Formats".
** Converts sorted BED files into UCSC's bigBed format.
** Requires bedToBigBed in PATH.
* BLAST+ tools are now commented in tool_conf.xml.sample.
** Not in Galaxy main, for local instance use.
** If you run the BLAST+ tools at your site, please be sure to uncomment
them if updating to the latest tool_conf.xml.sample (will be default at
next update).
* Add SAMtools flagstat tool.
* Add CCAT ChIP-seq peak/region caller.
* Add BWA wrapper for SOLiD.
= Histories =
* Estimated size is now displayed for very large text-based (non-binary)
datasets.
* Always show Galaxy masthead and enable Saved Histories to work with
and without panels.
* New "Copy Datasets" link added under History Options dropdown.
** Dataset and history IDs are now encoded.
** JS dropdown used to change source history, so you no longer have to
switch desired source history to be "active".
** Remove link in "edit attributes" of datasets.
** Simplified destination interface by using a single select box by
default, but also providing a link to show checkboxes for multiple
destination histories.
** Use newly imported Inflector library to get correct plural/singular
nouns on actions.
** Added arrow between source and destination.
** Removed checkbox for copying to a new history. Instead, create new
history if new history field is not blank.
= Data Libraries=
* Creating job information (stdout/stderr) is now available on the
library item info page, which is helpful when library uploads fail.
= Trackster =
* Add bigWig display to trackster. Automatically converts wig to
bigwig if needed (NOTE: datatypes_conf.xml.sample has been edited to
add the new converter, you must update datatypes_conf.xml to use it).
The converter requires that wigToBigWig be in the PATH, but no other
tools are needed to view bigwig files as they are provided by
bx_python.
* Tuning
** Fix track preferences not being applied
** Fix chroms not being selectable when a new track browser is created
** Fix ReferenceTrack not working with filters
** Fix visual analytics error when tool configuration has changed
** Fix visualization saving on Chrome by using $.each instead of for loop
** Fix shared visualizations
** Fix tracks fetching from data provider when indexer returned None,
and when
** Fix BAM reads without cigar string.
= Sample Tracking =
* Additions and tuning to improve tracking (complete documentation will
be available soon).
= User Interface (UI) =
*Version info is now printed in history item for Bowtie, BWA, Lastz,
TopHat, Samtools, Cuffdif, Cufflinks, Cuffcompare, BFAST, and PerM.
* Turn off web browser auto-complete for tool search (includes workflows).
* Grid changes resulting in better readability:
** Better page number display.
** Use "~" instead of "about"
example: "about 2 hours ago" -> "~ 2 hours ago".
** Cell padding decreased.
** Added new "nowrap" parameter to prevent text from being wrapped.
Currently only used for tags to prevent "X tags" from wrapping in the
middle.
= Application Programming Interface (API) =
* Still 'alpha', but: Initial pass completed for forms, request_type,
users and roles. See README and examples in source.
= Source =
* Galaxy now runs with system python on 64-bit mac kernel.
* Enhance select parameter wrapper objects to provide access to
additional fields by name for dynamically generated select lists (i.e.
dynamic_options).
example: use ${param.fields.path} to access a path field
* Updated the XML in filter specification for output files. The closing
filter tag can now be on separate line to use as an # actual filter.
* Implemented 'from_work_dir' attribute for tool outputs. Using this
attribute matches a file in the working directory to a tool output/HDA;
when a tool finishes, the file in the working directory is automatically
copied to the HDA. Hence, it's no longer necessary for tool wrapper
scripts to manually copy tool output files to HDA files.
* More programmatic control of page numbers on grid (custom UI).
* Many formerly undocumented options have been added to the
universe_wsgi.ini.sample file. Please compare your working copy with
the new .sample and determine whether any of the options are relevant to
your environment.
*The "welcome page" found at static/welcome.html, has been renamed to
static/welcome.html.sample.
* The UCSC Genome Browser now supports loading data via https. If you
implemented this former Galaxy URLs from https->http method in Apache as
per the wiki documentation you will also need to remove it there, and
allow the browser through your authentication scheme over https.
* Explicitly convert autocomplete dropdown values on refresh_on_change.
* Adjust image links in tools to work with a proxy prefix (thanks Brad
Chapman).
= Bug Fixes =
* Ensembl GTF files are recognized correctly.
* Make Tophat wrapper compatible with python 2.4 by removing
try-catch-finally.
* Fix unicode error for dataset peeks.
* Sanitize tool links in tool menu so that they can be searched. Use
only lowercase letters, numbers, and underscores in links to ensure
cross-browser functionality.
* Fix for building form element for boolean tool parameter when default
state is configured as checked in tool configuration, but user provided
value is non-checked. Fixes an issue seen in workflows that prevents
saving the unchecked state under described conditions.
* Fix typo in BFAST.
* Trackster: fix BAM display bugs, Dense display bug, better dataset
selection.
* Add validator to Tophat's segment mismatches parameter.
* Turn off web browser autocomplete for tool search in workflows.
* Fix Trackster's BAM display bugs, Dense display bug, better dataset
selection.
* Add validator to Tophat's segment mismatches parameter.
* Added the ctypes egg so the DRMAA egg will now work under Python 2.4.
* Bug fixes to the Mac OS X launcher in the galaxy-dist/contrib/ folder
(from Florent Angly).
* Python reports the wrong platform when running a 32-bit Python on
64-bit Linux, so Galaxy now forces the correct platform in this
environment (thanks David Hoover).
* Fix Chrome not auto-saving changes to workflow checkboxes.
----
About Galaxy
Galaxy is supported in part by NSF, NHGRI, the Huck Institutes of the
Life Sciences, and The Institute for CyberScience at Penn State.
Core Team http://bitbucket.org/galaxy/galaxy-central/wiki/GalaxyTeam
Use Galaxy! http://usegalaxy.org
GalaxyProject.org http://galaxyproject.org|GalaxyProject.org
Development Home http://bitbucket.org/galaxy/galaxy-central
----
Galaxy Team
January 31, 2011
11 years, 5 months
does Galaxy record tool versions?
by Yury Bukhman
Hi,
in order to reproduce an analysis, it's good to know not only what tools were used, but also their versions. Is there a way to figure that out from a Galaxy history? I would like to be able to answer questions like "what version of bowtie have I run in an analysis performed 6 months ago?"
Thanks.
Yury
--
Yury V. Bukhman, Ph.D.
Associate Scientist, Bioinformatics
Great Lakes Bioenergy Research Center
University of Wisconsin - Madison
445 Henry Mall, Rm. 513
Madison, WI 53706, USA
Phone: 608-890-2680 Fax: 608-890-2427
Email: ybukhman(a)glbrc.wisc.edu
11 years, 5 months
Re: [galaxy-user] Extract sequences from [gtf file] + [genome FASTA file]
by Jennifer Jackson
-------- Original Message --------
Subject: Re: [galaxy-user] Extract sequences from [gtf file] + [genome
FASTA file]
Date: Thu, 27 Jan 2011 17:23:11 -0700
From: Brian Foley PhD <btf(a)lanl.gov>
To: Jennifer Jackson <jen(a)bx.psu.edu>
Dear Jen,
I am not much of a Galaxy user yet, but a long time user of GenBank and
other databases and sequence analysis tools (Phylogenetics software, etc).
A common task I would like to do, is obtain a FASTA format file (ideally
aligned, but I can do the alignment later very easily) of the regions of
sequences hit in a BLAST search on GenBank.
It is easy to ask GenBank to give me all (or the selected few)
sequences
hit in the BLAST search, but not so easy to get each sequence "clipped" to
the matched region. For example, if I search with the D-loop region of a
mammal mitochondrial genome, I would like to get that region clipped out of
all the hundreds of complete mitochondrial genomes. Or if I search with a
mammalian endogenous retrovirus, get the retroviruses clipped from the
complete chromosome entries.
Ideally, I would add one more criteria. I would add that I would like
to be able to get some number of bases (lets say 100) flanking the matched
region. So I could capture the integration sites of endogenous
retroviruses, for example. Or get the intron flanks of a gene if I was
searching with a mammalian gene exon.
The final thing would be to deal with the fact that GenBank BLAST match
results often get fragmented. For example the LTRs of retroviruses
(endogenous or not) create a problem. And any large in/dels or highly
variable regions often split one contiguous homologous string into two
individual matches split at the in/del or variable site.
This looks somewhat similar to the task you describe below, so I am
wondering if it is something I can do in Galaxy (or with Galaxy plus a few
other tools).
GenBank/BLAST will almost give me what I want. The trouble I find is
that either I can get the result as a multiple sequence alignment but with
useless sequence names (just the gi number for identifier) and not in FASTA
format, or I can get full sequence entries but not the matched region
clipped out. I have asked NCBI/GenBank if they would serve up the results
in FASTA format, but they are not responsive on that.
Brian Foley PhD
HIV Databases
btf(a)lanl.gov
http://www.hiv.lanl.gov
On 1/27/11 1:36 PM, "Jennifer Jackson" <jen(a)bx.psu.edu> wrote:
> Hello Karen,
>
> The following general workflow should help you to pull sequences from
> any source.
>
> 1) cut out the sequence IDs from the query (in this case, a GTF & BED
> file) and sort them.
> Text Manipulation -> Cut columns from a table
> Filter and Sort -> Sort
> 2) convert the target fasta file to tabular format
> Convert Formats -> FASTA-to-Tabular converter
> 3) join the two datasets based on the sequence ID
> Join, Subtract and Group -> Join two Queries
> 4) covert to fasta
> Convert Formats -> Tabular-to-FASTA
> 5) when starting with a GTF file, there will most likely be duplicates.
> To remove, use:
> NGS: QC and manipulation -> Collapse sequences
>
> Once you create the actual workflow that performs the job, be sure to
> save it so that you can just re-use it whenever you need to perform the
> same task. To do this, from the history pane (most right) use Options ->
> Extract workflow and following the instructions on the form to customize.
>
> Hopefully this helps,
>
> Jen
> Galaxy team
11 years, 5 months