October 2013 - galaxy-user - lists.galaxyproject.org

Adding a tool to Galaxy Main Server
by Hamid Reza Hassanzadeh 07 Oct '13

07 Oct '13

Hi everybody, We are going to add our gene prediction tool to Galaxy Main server, anybody know whom we should contact? Thanks -- Hamid Reza Hassanzadeh, PhD Student & Graduate Research Assistant, Bioinformatics Lab, Center for Bioinformatics and Computational Genomics Joint Georgia Tech and Emory Wallace H Coulter Department of Biomedical Engineering, Department of Computer Science at Georgia Institute of Technology office: KACB 1343

2 1

Fwd: Exome Sequencing Analysis
by Johnathan Cooper-Knock 07 Oct '13

07 Oct '13

Hello, My name is Johnathan Cooper-Knock, I am a clinical fellow based at the University of Sheffield, UK. I am trying to use Galaxy for analysis of DNA sequencing data and I have run into a problem. I am trying to run the SAM/BAM Hybrid Selection Metrics step on Galaxy (part of the Picard tools) but I can't find bait and target bed files that Galaxy will accept. My library capture was performed using the Aglilent All Exon V4 kit and I have uploaded the 'S03723314_Regions.bed' and 'S03723314_Covered.bed' files from ' https://earray.chem.agilent.com/suredesign/search.htm' but Galaxy does not even seem to recognise them as options for entry. Is this a formatting issue? Do you know where I can get ready formatted files? Thanks Johnathan

2 1

galaxy cloud: location of imported and converted fastq's
by p Vedell 07 Oct '13

07 Oct '13

Hi, I had fastq.gz files in an Amazon S3 bucket. I created a Galaxy on the Cloud instance using CloudMan. I used GetData and pasted in the url's of the fastq.gz files into Galaxy on the Cloud. They were successfully imported into the session and converted to fastq's, it seems. However, at the next step (fastq groomer), Galaxy on the Cloud appears to expect that the imported and converted fastq files are in the same S3 bucket where the fastq.gz files were. But, they are not there. I read somewhere in documentation or notes that the data is actually in /mnt/galaxyData folder. However, I am not sure how to point fastq groomer to this place in the web interface or alternatively, I am not sure how to move the fastq's back to the s3 bucket (w/o downloading and re-uploading which would be very time-consuming). Thanks for any help you can provide. PT

2 1

Re: [galaxy-user] Metagenomic filtering
by Scott Tighe 06 Oct '13

06 Oct '13

Dear Jing What you have outlined below is perfect. I wonder how hard it would be to design a few filters that only look a certain genes and or filter model organisms out of the dataset. For example, say you want only data for 16s or only gyrase, but no /E.coli/ and no /Pseudomanas aeroginosa/ Scott Scott Tighe Senior Core Laboratory Research Staff Advanced Genome Technologies Core University of Vermont Vermont Cancer Center 149 Beaumont ave Health Science Research Facility 303/305 Burlington Vermont 05405 802-656-2557 On 9/25/2013 12:06 AM, Jing Yu wrote: > Hi Scott, > > My first thought is: > > 1. Remove rDNA sequences (and/or other well known highly-conserved > sequences to reduce the workload in step 2). > 2. Blast, then remove sequences with > (say 99%) match to > (say 5) > genus. (Optional if step 1 is already good enough) > > > For step 1: > Build a fasta file of the chosen highly conserved sequences, and > use it as a feed to blast against your MiSeq result. > Remove positive hits. > For step 2: > Blast remaining MiSeq sequences against NCBI (or whatever) database. > Remove if it hits more than n genus. > > Jing > On 24 Sep 2013, at 22:17, Scott Tighe <scott.tighe(a)uvm.edu > <mailto:scott.tighe@uvm.edu>> wrote: > >> Jing et al >> >> Thank you for the offer to write some code to help advance the >> metagenomics arena. It is certainly needed. >> >> So the problem is well known with megablast and shotgun metagenomics >> and without proper understanding and correct software will yield very >> misleading and in many cases incorrect data. For those of us who wish >> NOT to move to a protein level of comparison for specific reasons, we >> are stuck. >> >> *The Problem:* >> >> If I megablast 50 million sequences from a HiSeq run, millions of >> rRNA sequences will have a 99% match to all microbes rRNA genbank >> deposits. Not surprizing since the rRNA is highly conserved. The >> difference between E.coli and Shigella is 1 to 2 bases for the full >> 1540 bp 16s. So 16s is not useful for Genus level, and certainly not >> Species >> >> *So what happens:* >> >> The returned matches will have many hits to whatever model organism >> is in Genbank. For example E coli has 13000 entries for rRNA and >> Sphearotilus has 3 entries for rRNA. If the blasted sequence matches >> both, the results will mislead the investigator to think they have >> 13000 hits to E coli, EVEN if the microbe is Sphearotilus. >> >> *The cure?:* >> >> If there was a way to filter/ remove all hits ? Let say, for example, >> that a result has a first match (say E. coli) at >99% a second match >> (say Pseudomanas) at >99% and a third , forth and fifth match >99 for >> three other organisms. This sequence _must_ be discarded because it >> is a conserve sequence. >> >> Basically conserved sequence is the enemy and invalidates the entire >> result. >> * >> **Another problem:* >> >> If you have a reference sample with 19 non-model microbes, and you >> run that by HiSeq Shotgun for metagenomics and then megablast, what >> do you think you get? If E coli is not in the reference sample, how >> many hits do you think you get? Yes, 10,000 of thousands. So without >> removing conserved sequences, your data is wrong and you are much >> better served by culturing and running a Biolog metabolic panel and >> comparing to the sequence result. >> >> So where do we start? I have some shotgun metagenomics data from the >> reference sample which included the 19 microbes. That was data from a >> MiSeq. >> >> Scott >> Scott Tighe >> Senior Core Laboratory Research Staff >> Advanced Genome Technologies Core >> University of Vermont >> Vermont Cancer Center >> 149 Beaumont ave >> Health Science Research Facility 303/305 >> Burlington Vermont 05405 >> 802-656-2557 >> On 9/20/2013 9:17 PM, Jing Yu wrote: >>> Hi Scott, >>> >>> I can do some perl programming, such as local/remote blasting. Can >>> you specify your problem a little bit clearer, so that maybe I can >>> write a program to do just that? >>> >>> Regards, >>> Jing >>> >>> >>> >>> >>> Gerald >>> >>> 16s is basically useless for identification to genus. Since I >>> started sequencing 16s in 1992, I have come to realize that without >>> sequencing the full 1540 bases, it is generally misleading, and >>> even than, it is not accurate enough to nail genus on more than 1/2 >>> the cases. However, what is your feeling on ITS and gyrase, They >>> seem to be far more discriminating but those databases have been >>> decommissioned sometime ago. >>> >>> The desirable thing would be that Galaxy or NCBI add a "filter >>> conserved genes" [ ie any hit with a second choice greater than 3% >>> distance]. Something such as that. >>> >>> If you (or others) are aware of such a thing, I'd love the here >>> about it. >>> >>> Sincerely >>> Scott >> >

2 1

add nglims to galaxy-dist
by Peter Huang 03 Oct '13

03 Oct '13

Hi Brad, I am curious if you could tell me the way that I could add your nglims to my galaxy-dist production system. I have read your step by step procedure here: http://wiki.galaxyproject.org/Admin/Sample%20Tracking/Next%20Gen but it didn't mention how to add it to the galaxy-dist. Do you have a patch name for it? Should I use hg pull or hg patch? Thanks Best, Peter

2 5

Can TopHat take SNPs into account?
by Hoang, Thanh 03 Oct '13

03 Oct '13

Hi, I have been mapping my RNA-seq data to mouse genome from a different mouse strain using TopHat. I am wondering whether TopHat can take SNPs into account during the alignment? ( using SNPs track as an optional input)? Thanks Thanh

2 1

TopHat
by Kumar Sankaran 03 Oct '13

03 Oct '13

Hi all, I am trying to perform TopHat for Illumina from Galaxy's Main Server, and it is in queue for the past 3 to 4 days. Is this normal? Thanks & Regards,Kumar.

2 1

queue rules for multi-step workflows
by zuzmus 03 Oct '13

03 Oct '13

Dear Galaxy managers, I would like to ask about the queuing rules for the workflows before processed on the server? I use my customized Galaxy workflow which contains 22 following steps (basically, filtering, trimming and format change of NGS data). I guess the server is generally very busy during last weeks/months (?), so my job was waiting about 24 hours in a queue (which would not be a problem), and then the first step of the workflow was processed, but the following 21 is again/still waiting (already for another couple of hours...). It makes me wondering about the queuing rules because I expected that the whole workflow is queued as one job.. Then my question is if the whole workflow, once submitted, is listed in the queue, or does the following step queue only after the previous step is finished (which would mean to wait the whole queue for each step of the workflow...)? I routinely used those wrokflows before (months ago) without any problems... I tried to search similar question in the archive before I posted this one... Thanks a lot for your answer, My best Zuzana Musilova -- Zuzana Musilova, PhD. (zuzmus(a)gmail.com) Zoological Institute University of Basel Vesalgasse 1, CH-4051 Basel Switzerland - Europe

3 3

Please save these dates! GCC2014: June 30 - July 2
by Dave Clements 02 Oct '13

02 Oct '13

Hello all, The 2014 Galaxy Community Conference (GCC2014)<http://wiki.galaxyproject.org/Events/GCC2014> has been scheduled for June 30 through July 2, at the Homewood Campus<http://webapps.jhu.edu/jhuniverse/information_about_hopkins/campuses/homewo…> of Johns Hopkins University <http://jhu.edu/>, in Baltimore, Maryland<http://visitors.baltimorecity.gov/>, United States. Galaxy Community Conferences are an opportunity to participate in presentations, discussions, poster sessions, lightning talks and breakouts, all about high-throughput biology and the tools that support it. The conference will also includes a*Training Day* offering in-depth topic coverage, across several concurrent sessions. See the GCC2013 web site<http://wiki.galaxyproject.org/Events/GCC2013> for an idea of what happens at a Galaxy Community Conference. See you next summer, GCC2014 Organizing Committee<http://wiki.galaxyproject.org/Events/GCC2014/Organizers> -- http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ http://wiki.galaxyproject.org/

1 0

URL problem
by Kucukural, Alper 02 Oct '13

02 Oct '13

Hi, I have a problem in galaxy to get host/domain name in two different pages. First one is in the tool installation from toolshed, I got the error below, ##### Not Found The requested URL /admin_toolshed/prepare_for_install was not found on this server. ##### The second one is in the saved histories. When I click the buttons of the saved histories. I got the similar error like below. ##### Not Found The requested URL /history/list was not found on this server. ##### I haven't seen these any other pages yet. My installation is working on LDAP authentication with Proxy. So, I could not find a place to set the domain or host name in these two places that they can actually find the requested URLs. In the paster.log file. I don't get any error when I install a tool or go to another history. It doesn't report any error. Thanks for your help, Alper Kucukural

3 2