we've just upgraded our Galaxy server (Galaxy revision 7148:17d57db9a7c0, upgraded to revision 10422:a886bc3ae924 ), and
have found that an NGS training workflow that one of us set up and published is no longer accessible - we now get a stack trace
when we try to run it (screenshot attached) ; if we try to save to file, we "get page not found" from the browser, but no errors
reported in web logs, or Galaxy log as far as we can see
Is there any way we can recover the workflow ?
How to avoid this happening in future upgrades ? This time it is not a major problem, however we would not want this to happen to
production workflows. Should we save-to-file on all workflows as a precaution before an upgrade ? (Would that help ? )
The workflow included steps as below.
Grateful for any suggestions.
First Part : checking GC content
*Upload the sampling.fasta file or get it from the shared data in Galaxy.
*Use geecee from EMBOSS
*Remove beginning (Text manipulation), first line
*Convert as tabular (Text manipulation) or cut (Text manipulation)
*Histogram (Graph/Display Data)on col2
*Compute data (Text manipulation) (data4 convert; col2>0.2)
*Count data (statistics on col3) If it is worth it we can remove the sequences which have a too low GC content
Second part : sequence length
*Compute Sequence length (FASTA manipulation)
*Summary Statistics (Statistics)on col2
*Filter by length(FASTA manipulation) ; 800
*Line/world/character count(Text manipulation)
*Blastn against fungi db : yeast, only one hit to show (advanced options )
*Count the lines to know how many sequences have a hit . (Text manipulation)
Bonus track : removing sequences with low GC content
*Select (filter and sort) on data 6 where matching = True
*FASTA-to-Tabular (FASTA manipulation) the input file sampling.fasta (2 col for the title)
*Join two datasets(Join, Subtract and Group) on the columns c1
*Tabular-to-FASTA (fasta manipulation) of the previous results.
*You can see how many sequences you took away (click on the dataset name in History) and it should correspond to the number of True in Count .
I minor annoyance that I had found with the current implementation of "Tophat for Illumina (version 1.5.0)" in the public usegalaxy.org site:
When you submit sequences for alignment, the dropdown list of available genomes gives 2 dog genome choices - BUT both are labeled the same - "Dog(Canis lupus familiaris): canFam2".
After trial and error I found that the second (lower on the list) choice is actually "Dog(Canis lupus familiaris): canFam3.1".
Hopefully this can be corrected at some point - maybe when the next release comes out from the Broad.
Michael H. Court, BVSc, PhD, Diplomate ACVA
Professor and William R. Jones Endowed Chair
Individualized Medicine Program
Department of Veterinary Clinical Sciences
College of Veterinary Medicine, Washington State University
100 Grimes Way, Pullman, WA 99164-6610
Office: 509-335-0817 Cell: 774-287-7082 Fax: 509-335-0880
I am a graduate student in the University of Notre Dame. I have installed a
personal instance of Galaxy and have been developing tools there as a part
of my research.
Everything worked well on my laptop until recently, when I could not view
the 'History' panel, no matter which browser (Firefox, Chrome, Safari) I
used on my laptop (Windows 7 OS). Everything works absolutely fine and I
can view 'History' when I am using a desktop or any other laptop.
I have checked this across different operating systems and also ensured
that no changes were made with the Firewall setup of my laptop.
I have attached a print screen of what the Galaxy page looks like.
It would be great if I could get some help with this.
Thanks a lot in advance,
Is anyone who had developed histories on Galaxy online before the
changeover getting red error messages after the changeover for tophat or
I have now tried a few times to move forward without luck and will remove
at least one of my histories and start all over again just in case there is
some kind of mismatch in the processing.
Before I remove them all and start them all over, I would appreciate
knowing if anyone else has identified where the block is.
Hello, I have been waiting a few weeks to process some RNA seq datasets
but woke up this morning with lots of steps red. I thought it just might
be because of the movement of the system but I processed steps for some
histories and everything has turned red.
I also noticed that online Galaxy now has RNAseq steps separated into two
sections--does this have something to do with the problems?
Duke University Medical Center
I'm trying to use the bowtie2 wrapper but it generates and empty output
with the "no peek" sign. I have sucesfully recreated the analysis with the
same datatsets on the shell, so it must be something to do with Galaxy, but
I don't know where to look at. Any clues? Thanks
I'm using bowtie2-2.1.0 and Galaxy:
user: Nate Coraor <nate(a)bx.psu.edu>
date: Thu Sep 26 11:02:58 2013 -0400
summary: Bugfix for tool-to-destination mapping, tool ids are
lowercased but the mapping id was not lowercased
Mikel Egaña Aranguren, Ph.D.
Despite I purged all data from my Galaxy account and double checked that
my account is actually empty by looking at Saved Histories/Advanced
Search/status:all, my account is indicating that I am using half of it
(as indicated in User/Preferences).
Moreover, this is preventing me to actually use more than half of my
account showing that it is not a display problem.
It is also not a refresh rate problem since this problem started several
Could you please help me with that?
All the best,
Itys COMET, Ph.D. - Helin Group
BRIC - University of Copenhagen
Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark
Phone: +45-353 25602
Fax: +45-353 25669
Web page: http://www.bric.ku.dk/research/Helin_Group/
Hello, I have been using the compute function in galaxy to replace
sequences with delimiters for processing my reads. I realized that the
replace "code" in compute no longer works. Is there any other way to
replace sequences with delimiters?
All good thoughts and if I remember correctly, custom software can
indeed to incorparated into Galaxy through use of the "Toolshed" . I'll
check into this with Jennifer.
Senior Core Laboratory Research Staff
Advanced Genome Technologies Core
University of Vermont
Vermont Cancer Center
149 Beaumont ave
Health Science Research Facility 303/305
Burlington Vermont 05405
On 10/6/2013 9:59 AM, Jing Yu wrote:
> Dear Scott,
> I think what you propose is doable.
> You may
> 1. use a 16s or gyrase DNA sequence as feeds to blast against your
> data to get the relative sequences,
> 2. and then use the sequences as feeds to blast against your
> nucleotide database with appropriate filters.
> There are several ways to make the steps. For example, you may already
> have the 16s sequence from assembly against a reference genome.
> And for Step 2, if you are not blasting thousands of times a day, and
> believe in the recent stability of NCBI, then a simple web_blast code
> will do the trick. Otherwise, since the local blast+ toolkit doesn't
> provide the equivalent organism filters, you'll have to work a wit bit
> on it:
> Make a nucleotide database for Prokaryotes.
> Search txid561[ORGN] on http://www.ncbi.nlm.nih.gov/nuccore (this is
> for Escherichia as an example),
> Send to 'File' -> Format ->GI List
> When Blast, use this GI list as the value of this argument:
> Then parse the Blast result.
> Most of these can be automated with some code, but I don't know how to
> incorporate it into Galaxy.
> On 4 Oct 2013, at 23:52, Scott Tighe <scott.tighe(a)uvm.edu
> <mailto:email@example.com>> wrote:
>> Dear Jing
>> What you have outlined below is perfect.
>> I wonder how hard it would be to design a few filters that only look
>> a certain genes and or filter model organisms out of the dataset.
>> For example, say you want only data for 16s or only gyrase, but no
>> /E.coli/ and no /Pseudomanas aeroginosa/
>> Scott Tighe
>> Senior Core Laboratory Research Staff
>> Advanced Genome Technologies Core
>> University of Vermont
>> Vermont Cancer Center
>> 149 Beaumont ave
>> Health Science Research Facility 303/305
>> Burlington Vermont 05405
>> On 9/25/2013 12:06 AM, Jing Yu wrote:
>>> Hi Scott,
>>> My first thought is:
>>> 1. Remove rDNA sequences (and/or other well known highly-conserved
>>> sequences to reduce the workload in step 2).
>>> 2. Blast, then remove sequences with > (say 99%) match to > (say 5)
>>> genus. (Optional if step 1 is already good enough)
>>> For step 1:
>>> Build a fasta file of the chosen highly conserved sequences, and
>>> use it as a feed to blast against your MiSeq result.
>>> Remove positive hits.
>>> For step 2:
>>> Blast remaining MiSeq sequences against NCBI (or whatever) database.
>>> Remove if it hits more than n genus.
>>> On 24 Sep 2013, at 22:17, Scott Tighe <scott.tighe(a)uvm.edu
>>> <mailto:firstname.lastname@example.org>> wrote:
>>>> Jing et al
>>>> Thank you for the offer to write some code to help advance the
>>>> metagenomics arena. It is certainly needed.
>>>> So the problem is well known with megablast and shotgun
>>>> metagenomics and without proper understanding and correct software
>>>> will yield very misleading and in many cases incorrect data. For
>>>> those of us who wish NOT to move to a protein level of comparison
>>>> for specific reasons, we are stuck.
>>>> *The Problem:*
>>>> If I megablast 50 million sequences from a HiSeq run, millions of
>>>> rRNA sequences will have a 99% match to all microbes rRNA genbank
>>>> deposits. Not surprizing since the rRNA is highly conserved. The
>>>> difference between E.coli and Shigella is 1 to 2 bases for the full
>>>> 1540 bp 16s. So 16s is not useful for Genus level, and certainly
>>>> not Species
>>>> *So what happens:*
>>>> The returned matches will have many hits to whatever model organism
>>>> is in Genbank. For example E coli has 13000 entries for rRNA and
>>>> Sphearotilus has 3 entries for rRNA. If the blasted sequence
>>>> matches both, the results will mislead the investigator to think
>>>> they have 13000 hits to E coli, EVEN if the microbe is Sphearotilus.
>>>> *The cure?:*
>>>> If there was a way to filter/ remove all hits ? Let say, for
>>>> example, that a result has a first match (say E. coli) at >99% a
>>>> second match (say Pseudomanas) at >99% and a third , forth and
>>>> fifth match >99 for three other organisms. This sequence _must_ be
>>>> discarded because it is a conserve sequence.
>>>> Basically conserved sequence is the enemy and invalidates the
>>>> entire result.
>>>> **Another problem:*
>>>> If you have a reference sample with 19 non-model microbes, and you
>>>> run that by HiSeq Shotgun for metagenomics and then megablast, what
>>>> do you think you get? If E coli is not in the reference sample,
>>>> how many hits do you think you get? Yes, 10,000 of thousands. So
>>>> without removing conserved sequences, your data is wrong and you
>>>> are much better served by culturing and running a Biolog metabolic
>>>> panel and comparing to the sequence result.
>>>> So where do we start? I have some shotgun metagenomics data from
>>>> the reference sample which included the 19 microbes. That was data
>>>> from a MiSeq.
>>>> Scott Tighe
>>>> Senior Core Laboratory Research Staff
>>>> Advanced Genome Technologies Core
>>>> University of Vermont
>>>> Vermont Cancer Center
>>>> 149 Beaumont ave
>>>> Health Science Research Facility 303/305
>>>> Burlington Vermont 05405
>>>> On 9/20/2013 9:17 PM, Jing Yu wrote:
>>>>> Hi Scott,
>>>>> I can do some perl programming, such as local/remote blasting. Can
>>>>> you specify your problem a little bit clearer, so that maybe I can
>>>>> write a program to do just that?
>>>>> 16s is basically useless for identification to genus. Since I
>>>>> started sequencing 16s in 1992, I have come to realize that
>>>>> without sequencing the full 1540 bases, it is generally
>>>>> misleading, and even than, it is not accurate enough to nail genus
>>>>> on more than 1/2 the cases. However, what is your feeling on
>>>>> ITS and gyrase, They seem to be far more discriminating but those
>>>>> databases have been decommissioned sometime ago.
>>>>> The desirable thing would be that Galaxy or NCBI add a "filter
>>>>> conserved genes" [ ie any hit with a second choice greater than 3%
>>>>> distance]. Something such as that.
>>>>> If you (or others) are aware of such a thing, I'd love the here
>>>>> about it.
I just established Galaxy Cloudman on Amazon EC2. I created a cluster named
"exon_capture" and uploaded a lot of data to it. After some analysis, I
terminated the cluster. The second time I wanted to get access to the
"exon_capture" cluster, I created a new instance under the "exon_capture".
However, under this instance, the "access galaxy" button is in grey and not
active. I tried several time, but the button was always in grey. Does anyone
know what's wrong? Did I miss something? It may be a very simple question,
but it bothers me a whole afternoon. Thanks a lot!