October 2013 - galaxy-user - lists.galaxyproject.org

unable to import run or save-to-file published workflow after galaxy upgrade
by McCulloch, Alan 11 Oct '13

11 Oct '13

dear all, we've just upgraded our Galaxy server (Galaxy revision 7148:17d57db9a7c0, upgraded to revision 10422:a886bc3ae924 ), and have found that an NGS training workflow that one of us set up and published is no longer accessible - we now get a stack trace when we try to run it (screenshot attached) ; if we try to save to file, we "get page not found" from the browser, but no errors reported in web logs, or Galaxy log as far as we can see Is there any way we can recover the workflow ? How to avoid this happening in future upgrades ? This time it is not a major problem, however we would not want this to happen to production workflows. Should we save-to-file on all workflows as a precaution before an upgrade ? (Would that help ? ) The workflow included steps as below. Grateful for any suggestions. Cheers Alan McC First Part : checking GC content *Upload the sampling.fasta file or get it from the shared data in Galaxy. *Use geecee from EMBOSS *Remove beginning (Text manipulation), first line *Convert as tabular (Text manipulation) or cut (Text manipulation) *Histogram (Graph/Display Data)on col2 *Compute data (Text manipulation) (data4 convert; col2>0.2) *Count data (statistics on col3) If it is worth it we can remove the sequences which have a too low GC content 3 Second part : sequence length *Compute Sequence length (FASTA manipulation) *Summary Statistics (Statistics)on col2 *Filter by length(FASTA manipulation) ; 800 *Line/world/character count(Text manipulation) *Blastn against fungi db : yeast, only one hit to show (advanced options ) *Count the lines to know how many sequences have a hit . (Text manipulation) 4 Bonus track : removing sequences with low GC content *Select (filter and sort) on data 6 where matching = True *FASTA-to-Tabular (FASTA manipulation) the input file sampling.fasta (2 col for the title) *Join two datasets(Join, Subtract and Group) on the columns c1 *Tabular-to-FASTA (fasta manipulation) of the previous results. *You can see how many sequences you took away (click on the dataset name in History) and it should correspond to the number of True in Count . 5

3 2

Tophat for Illumina - built-in dog genome mislabelled
by Court, Michael 11 Oct '13

11 Oct '13

Hi, I minor annoyance that I had found with the current implementation of "Tophat for Illumina (version 1.5.0)" in the public usegalaxy.org site: When you submit sequences for alignment, the dropdown list of available genomes gives 2 dog genome choices - BUT both are labeled the same - "Dog(Canis lupus familiaris): canFam2". After trial and error I found that the second (lower on the list) choice is actually "Dog(Canis lupus familiaris): canFam3.1". Hopefully this can be corrected at some point - maybe when the next release comes out from the Broad. Michael [cid:image002.png@01CEC5C0.75D35560] Michael H. Court, BVSc, PhD, Diplomate ACVA Professor and William R. Jones Endowed Chair Individualized Medicine Program Pharmacogenomics Laboratory Department of Veterinary Clinical Sciences College of Veterinary Medicine, Washington State University 100 Grimes Way, Pullman, WA 99164-6610 Office: 509-335-0817 Cell: 774-287-7082 Fax: 509-335-0880 michael.court(a)vetmed.wsu.edu<mailto:michael.court@vetmed.wsu.edu> www.vetmed.wsu.edu<http://www.vetmed.wsu.edu/>

2 1

Trouble viewing History from a particular laptop
by Olivia Choudhury 10 Oct '13

10 Oct '13

Hello, I am a graduate student in the University of Notre Dame. I have installed a personal instance of Galaxy and have been developing tools there as a part of my research. Everything worked well on my laptop until recently, when I could not view the 'History' panel, no matter which browser (Firefox, Chrome, Safari) I used on my laptop (Windows 7 OS). Everything works absolutely fine and I can view 'History' when I am using a desktop or any other laptop. I have checked this across different operating systems and also ensured that no changes were made with the Firewall setup of my laptop. I have attached a print screen of what the Galaxy page looks like. It would be great if I could get some help with this. Thanks a lot in advance, Olivia

2 1

Problems processing histories existing on galaxy before the changeover
by Elwood Linney 10 Oct '13

10 Oct '13

Is anyone who had developed histories on Galaxy online before the changeover getting red error messages after the changeover for tophat or cuffdiff processing? I have now tried a few times to move forward without luck and will remove at least one of my histories and start all over again just in case there is some kind of mismatch in the processing. Before I remove them all and start them all over, I would appreciate knowing if anyone else has identified where the block is. Elwood Linney

2 1

Online galaxy turning everything red on RNAseq steps
by Elwood Linney 10 Oct '13

10 Oct '13

Hello, I have been waiting a few weeks to process some RNA seq datasets but woke up this morning with lots of steps red. I thought it just might be because of the movement of the system but I processed steps for some histories and everything has turned red. I also noticed that online Galaxy now has RNAseq steps separated into two sections--does this have something to do with the problems? el linney Duke University Medical Center

3 2

Empty bowtie2 output
by Mikel Egaña Aranguren 10 Oct '13

10 Oct '13

Hi; I'm trying to use the bowtie2 wrapper but it generates and empty output with the "no peek" sign. I have sucesfully recreated the analysis with the same datatsets on the shell, so it must be something to do with Galaxy, but I don't know where to look at. Any clues? Thanks I'm using bowtie2-2.1.0 and Galaxy: changeset: 10421:a477486bf18e branch: stable tag: tip user: Nate Coraor <nate(a)bx.psu.edu> date: Thu Sep 26 11:02:58 2013 -0400 summary: Bugfix for tool-to-destination mapping, tool ids are lowercased but the mapping id was not lowercased -- Mikel Egaña Aranguren, Ph.D. http://mikeleganaaranguren.com

2 1

Galaxy available disk space bug
by Itys Comet 08 Oct '13

08 Oct '13

Hi everybody, Despite I purged all data from my Galaxy account and double checked that my account is actually empty by looking at Saved Histories/Advanced Search/status:all, my account is indicating that I am using half of it (as indicated in User/Preferences). Moreover, this is preventing me to actually use more than half of my account showing that it is not a display problem. It is also not a refresh rate problem since this problem started several months ago. Could you please help me with that? All the best, Itys -- Itys COMET, Ph.D. - Helin Group BRIC - University of Copenhagen Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark Phone: +45-353 25602 Fax: +45-353 25669 Email: itys.comet(a)bric.ku.dk Web page: http://www.bric.ku.dk/research/Helin_Group/

2 1

replace function on galaxy
by Xianrong Wong 08 Oct '13

08 Oct '13

Hello, I have been using the compute function in galaxy to replace sequences with delimiters for processing my reads. I realized that the replace "code" in compute no longer works. Is there any other way to replace sequences with delimiters? Jose

2 1

Re: [galaxy-user] Metagenomic filtering
by Scott Tighe 08 Oct '13

08 Oct '13

Jing All good thoughts and if I remember correctly, custom software can indeed to incorparated into Galaxy through use of the "Toolshed" . I'll check into this with Jennifer. Thanks Scott Scott Tighe Senior Core Laboratory Research Staff Advanced Genome Technologies Core University of Vermont Vermont Cancer Center 149 Beaumont ave Health Science Research Facility 303/305 Burlington Vermont 05405 802-656-2557 On 10/6/2013 9:59 AM, Jing Yu wrote: > Dear Scott, > > I think what you propose is doable. > > You may > 1. use a 16s or gyrase DNA sequence as feeds to blast against your > data to get the relative sequences, > 2. and then use the sequences as feeds to blast against your > nucleotide database with appropriate filters. > > There are several ways to make the steps. For example, you may already > have the 16s sequence from assembly against a reference genome. > And for Step 2, if you are not blasting thousands of times a day, and > believe in the recent stability of NCBI, then a simple web_blast code > will do the trick. Otherwise, since the local blast+ toolkit doesn't > provide the equivalent organism filters, you'll have to work a wit bit > on it: > > Make a nucleotide database for Prokaryotes. > Search txid561[ORGN] on http://www.ncbi.nlm.nih.gov/nuccore (this is > for Escherichia as an example), > Send to 'File' -> Format ->GI List > When Blast, use this GI list as the value of this argument: > -negative_gilist > Then parse the Blast result. > > Most of these can be automated with some code, but I don't know how to > incorporate it into Galaxy. > > Regards, > Jing > On 4 Oct 2013, at 23:52, Scott Tighe <scott.tighe(a)uvm.edu > <mailto:scott.tighe@uvm.edu>> wrote: > >> Dear Jing >> >> What you have outlined below is perfect. >> >> I wonder how hard it would be to design a few filters that only look >> a certain genes and or filter model organisms out of the dataset. >> >> For example, say you want only data for 16s or only gyrase, but no >> /E.coli/ and no /Pseudomanas aeroginosa/ >> >> Scott >> Scott Tighe >> Senior Core Laboratory Research Staff >> Advanced Genome Technologies Core >> University of Vermont >> Vermont Cancer Center >> 149 Beaumont ave >> Health Science Research Facility 303/305 >> Burlington Vermont 05405 >> 802-656-2557 >> On 9/25/2013 12:06 AM, Jing Yu wrote: >>> Hi Scott, >>> >>> My first thought is: >>> >>> 1. Remove rDNA sequences (and/or other well known highly-conserved >>> sequences to reduce the workload in step 2). >>> 2. Blast, then remove sequences with > (say 99%) match to > (say 5) >>> genus. (Optional if step 1 is already good enough) >>> >>> >>> For step 1: >>> Build a fasta file of the chosen highly conserved sequences, and >>> use it as a feed to blast against your MiSeq result. >>> Remove positive hits. >>> For step 2: >>> Blast remaining MiSeq sequences against NCBI (or whatever) database. >>> Remove if it hits more than n genus. >>> >>> Jing >>> On 24 Sep 2013, at 22:17, Scott Tighe <scott.tighe(a)uvm.edu >>> <mailto:scott.tighe@uvm.edu>> wrote: >>> >>>> Jing et al >>>> >>>> Thank you for the offer to write some code to help advance the >>>> metagenomics arena. It is certainly needed. >>>> >>>> So the problem is well known with megablast and shotgun >>>> metagenomics and without proper understanding and correct software >>>> will yield very misleading and in many cases incorrect data. For >>>> those of us who wish NOT to move to a protein level of comparison >>>> for specific reasons, we are stuck. >>>> >>>> *The Problem:* >>>> >>>> If I megablast 50 million sequences from a HiSeq run, millions of >>>> rRNA sequences will have a 99% match to all microbes rRNA genbank >>>> deposits. Not surprizing since the rRNA is highly conserved. The >>>> difference between E.coli and Shigella is 1 to 2 bases for the full >>>> 1540 bp 16s. So 16s is not useful for Genus level, and certainly >>>> not Species >>>> >>>> *So what happens:* >>>> >>>> The returned matches will have many hits to whatever model organism >>>> is in Genbank. For example E coli has 13000 entries for rRNA and >>>> Sphearotilus has 3 entries for rRNA. If the blasted sequence >>>> matches both, the results will mislead the investigator to think >>>> they have 13000 hits to E coli, EVEN if the microbe is Sphearotilus. >>>> >>>> *The cure?:* >>>> >>>> If there was a way to filter/ remove all hits ? Let say, for >>>> example, that a result has a first match (say E. coli) at >99% a >>>> second match (say Pseudomanas) at >99% and a third , forth and >>>> fifth match >99 for three other organisms. This sequence _must_ be >>>> discarded because it is a conserve sequence. >>>> >>>> Basically conserved sequence is the enemy and invalidates the >>>> entire result. >>>> * >>>> **Another problem:* >>>> >>>> If you have a reference sample with 19 non-model microbes, and you >>>> run that by HiSeq Shotgun for metagenomics and then megablast, what >>>> do you think you get? If E coli is not in the reference sample, >>>> how many hits do you think you get? Yes, 10,000 of thousands. So >>>> without removing conserved sequences, your data is wrong and you >>>> are much better served by culturing and running a Biolog metabolic >>>> panel and comparing to the sequence result. >>>> >>>> So where do we start? I have some shotgun metagenomics data from >>>> the reference sample which included the 19 microbes. That was data >>>> from a MiSeq. >>>> >>>> Scott >>>> Scott Tighe >>>> Senior Core Laboratory Research Staff >>>> Advanced Genome Technologies Core >>>> University of Vermont >>>> Vermont Cancer Center >>>> 149 Beaumont ave >>>> Health Science Research Facility 303/305 >>>> Burlington Vermont 05405 >>>> 802-656-2557 >>>> On 9/20/2013 9:17 PM, Jing Yu wrote: >>>>> Hi Scott, >>>>> >>>>> I can do some perl programming, such as local/remote blasting. Can >>>>> you specify your problem a little bit clearer, so that maybe I can >>>>> write a program to do just that? >>>>> >>>>> Regards, >>>>> Jing >>>>> >>>>> >>>>> >>>>> >>>>> Gerald >>>>> >>>>> 16s is basically useless for identification to genus. Since I >>>>> started sequencing 16s in 1992, I have come to realize that >>>>> without sequencing the full 1540 bases, it is generally >>>>> misleading, and even than, it is not accurate enough to nail genus >>>>> on more than 1/2 the cases. However, what is your feeling on >>>>> ITS and gyrase, They seem to be far more discriminating but those >>>>> databases have been decommissioned sometime ago. >>>>> >>>>> The desirable thing would be that Galaxy or NCBI add a "filter >>>>> conserved genes" [ ie any hit with a second choice greater than 3% >>>>> distance]. Something such as that. >>>>> >>>>> If you (or others) are aware of such a thing, I'd love the here >>>>> about it. >>>>> >>>>> Sincerely >>>>> Scott >>>> >>> >> >

1 0

How to lauch bulit cluster on Cloudman?
by Yan He 08 Oct '13

08 Oct '13

Dear all, I just established Galaxy Cloudman on Amazon EC2. I created a cluster named "exon_capture" and uploaded a lot of data to it. After some analysis, I terminated the cluster. The second time I wanted to get access to the "exon_capture" cluster, I created a new instance under the "exon_capture". However, under this instance, the "access galaxy" button is in grey and not active. I tried several time, but the button was always in grey. Does anyone know what's wrong? Did I miss something? It may be a very simple question, but it bothers me a whole afternoon. Thanks a lot! Yan

2 1