November 2010 - galaxy-user - lists.galaxyproject.org

Re: [galaxy-user] Blast and Fetching Data from NCBI
by Peter 09 Nov '10

09 Nov '10

Hi John, I've CC'd the mailing list again (assuming you omitted it by mistake - its easily done). On Tue, Nov 9, 2010 at 8:43 PM, John David Osborne wrote: > >>I've written a BLAST+ wrapper which is now in galaxy-central, and >>will eventually be in galaxy-dist and thus potentially on the public >>Galaxy server (assuming it won't tax it too much). This doesn't >>(yet) offer the option to run the BLAST remotely at the NCBI >>(the wrapper could do this in theory). However, running a BLAST >> against the NCBI databases will cause difficulties for reproducibility >>(since we have no control over the databases, and the NCBI make >>regular updates). > > Understood, I assume then most people get their sequence data > and upload before starting a workload... That wrapper does sound > handy though! It just seems odd to me that the Galaxy portal > doesn't have a generic blast tool and standard (nt, nr, etc...) to > run against. Well, I'm sure they will consider it in future. But if I were in their shoes, I'd be a little nervous about the computational load it might impose. >>Would you be running your own Galaxy server? > > That's the current plan. We are also looking at Taverna, but it > doesn't seem like I can find much in the way (actually none) > of biology publications that use it as part of their workflow. I > haven't looked that hard though. Within our institute we intend to offer BLAST running on our departmental server (in Galaxy) for Biologists to run large searches (e.g. multiple query sequences like a set of contigs) against both NCBI databases like NR and also in house ones (e.g. unpublished genomes), and as part of workflows (e.g. upload a FASTA file, blast against organism X, divide the FASTA file into those with good matches and those without). This is drifting into a discussion more suited for the galaxy dev mailing list though ;) Peter

1 0

Galaxy Production Server Settings
by lentaing＠jimmy.harvard.edu 09 Nov '10

09 Nov '10

Hi, We have deployed a local instance of Galaxy and have recently been experiencing several problems which we believe to be related to our server being overloaded with user requests. We get the following error message (from the log files): raise exc.TimeoutError("QueuePool limit of size %d overflow %d reached, connection timed out, timeout %d" % (self.size(), self.overflow(), self._timeout)) TimeoutError: QueuePool limit of size 10 overflow 20 reached, connection timed out, timeout 30 Could someone explain why we might get this error message and how we might configure our server settings to solve this problem? Is universe_wsgi.ini.sample a configuration for production server instances? If not, could someone post some suggested settings for production servers (e.g. the settings that you use for the main galaxy instance http://main.g2.bx.psu.edu/)? Finally, is there a wiki page explaining the settings in universe_wsgi.ini? Thanks, Len

2 1

Blast and Fetching Data from NCBI
by John David Osborne 09 Nov '10

09 Nov '10

Hi, I am new to galaxy and I am wondering if it is possible to: 1) Fetch data from NCBI from the main galaxy portal? I assume this is possible from a local installation. For example, do a query say looking for a disease in rodents using Ratmine, export to Galaxy (did this fine) and then use the returned identifies to fetch sequence from NCBI. I can't seem to do this last bit - I don't even see NCBI under "Get Data". 2) Run blast from the main galaxy portal? Is this disabled due to too many users? Does this option appear once you have sequence? The context is our group at UAB is evaluating galaxy and I am just playing around to see what it can do. -John

2 1

Re: [galaxy-user] plugging R into galaxy
by ray mcgovern 09 Nov '10

09 Nov '10

Hi, This may not be the 'best' way, but it I've found it to be a workable solution. I'm running a Perl script that takes the tool-xml parameters, does some processing, and calls R to obtain a p-val calculation. The results are parsed and included in a formatted HTML page. It's all code based. You can likely use something similar in a Python script: # --- create the R command my $cmd=<<END; echo 'phyper($match, $listcount, $all, $qrycount, lower.tail = FALSE, log.p = FALSE)' > rcmd END system($cmd); # --- run R in command line mode my $runR=<<END; R --slave -f rcmd > rpval END system($runR); # --- open and parse results open my $fh, "<", "rpval"; my $line = <$fh>; close $fh; $line =~ /\[1\] (.+)/; hope you find this of use. -ray

1 0

subscription to Galaxy
by Sreenivasula Kurukuti 09 Nov '10

09 Nov '10

Dear Personal, I would appreciate if you could add my email address (sk123p(a)clinmed.gla.ac.uk) to your mailing list. Thank you sreenivasulu kurukuti PhD Division of Cancer Sciences and Molecular Pathology Section of Pathology and Gene Regulation Western Infirmity Glasgow United Kingdom G11 6NT Tel: 44-(0)141 211 2743 Fax: 44-(0)141 337 2494

2 1

Unable to import dataset from history into data library
by gerrit 09 Nov '10

09 Nov '10

Dear Galaxy Users, Since I've updated my Galaxy version I'm unable to import any dataset from my history into my data library. It returns a message of e.g. "Invalid item id (1464) specified". I then did a complete new test installation to make sure that this problem wasn't because of the update. But the error still remains. I've included a screen shot that reports the error. When a queried the database the dataset id does exist in the history_dataset_association table. I went through the paster.log file and found the following warning message: Unknown library item type: <class 'galaxy.model.HistoryDatasetAssociation'>. I'm sure this is the problem and would need to find a way to fix it. Does anyone have the same issue? Thanks, Gerrit

1 0

Re: [galaxy-user] [galaxy-bugs] Galaxy tool error report from np0005@uah.edu
by Jennifer Jackson 08 Nov '10

08 Nov '10

Hi Nripesh, The reference file is the source genome or any other fasta file that your data is derived from (could be custom). It is the "reference" sequence that you will be using for mapping. There are a few thing to try: If your data will be mapped to a genome already in Galaxy, then use the pencil icon (for the SAM history item) and alter the attributes to assign a genome. Next, use the same genome as the reference when running SAM->BAM. Please note that not all genomes are indexed for use by SAM tools. If your genome is not here, we are open to requests to add more, if the data is in our main genome list or publicly available from a stable source. Please be specific for requests - exact genome name as we use it, or a link to NCBI, or a link to another public data source is preferred. If your data is custom, the database can remain undefined (will display as a "?"). Load your custom fasta genome/sequence into your history, if not already there. Then when running SAM->BAM, use the option "locally cashed" and set the reference to be that loaded custom fasta file. Hopefully this helps to resolve the issue. But, if you continue to have problems, please feel free to share your history and we can take a closer look. To do this, at the top of the history pane (right): Options -> Share or Publish -> Make History Accessible via Link and email to me. Thanks! Jen Galaxy team On 11/8/10 12:20 PM, Nripesh Prasad wrote: > Hii Jennifer, > I am doing it that way only, i have uploaded my .sam files in history, > now when i select NGS: SAM Tools -> SAM-to-BAM, it gives me two options > to choose the source, if i select history and enter the sam file then it > is asking for a reference file, What is a reference file adn where do i > get a reference file ? > if i do it by the option locally cached as a source then it is giving me > following error. > > *Error(s):* > > * No Content-Length: returned in header for > http://main.g2.bx.psu.edu/display_application/41246453625d8c97/ucsc_bam/mai…, > can't proceed, sorry > > what shopuld i do now? > Nripesh > On Mon, Nov 8, 2010 at 2:14 PM, Jennifer Jackson <jen(a)bx.psu.edu > <mailto:jen@bx.psu.edu>> wrote: > > Hi Nripesh, > > Use "NGS: SAM Tools -> SAM-to-BAM". This will create a new BAM data > history item. > > Hope this helps! > > Jen > Galaxy team > > ps. For new data/usage questions, it would be great for us if you > could send them to the mailing list galaxy-user(a)bx.psu.edu > <mailto:galaxy-user@bx.psu.edu>. We like to publish answers there > for other all to learn from. > > > On 11/8/10 11:31 AM, Nripesh Prasad wrote: > > Hii Jennifer, > How can i convert .sam format to .bam format in galaxy. > Nripesh > > > > -- > Jennifer Jackson > http://usegalaxy.org <http://usegalaxy.org/> > > -- Jennifer Jackson http://usegalaxy.org

1 0

Re: [galaxy-user] [galaxy-bugs] Galaxy tool error report from np0005@uah.edu
by Jennifer Jackson 08 Nov '10

08 Nov '10

Hi Nripesh, Use "NGS: SAM Tools -> SAM-to-BAM". This will create a new BAM data history item. Hope this helps! Jen Galaxy team ps. For new data/usage questions, it would be great for us if you could send them to the mailing list galaxy-user(a)bx.psu.edu. We like to publish answers there for other all to learn from. On 11/8/10 11:31 AM, Nripesh Prasad wrote: > Hii Jennifer, > How can i convert .sam format to .bam format in galaxy. > Nripesh -- Jennifer Jackson http://usegalaxy.org

1 0

Re: [galaxy-user] [galaxy-bugs] report the error
by Jeremy Goecks 08 Nov '10

08 Nov '10

Hi Mingquan, Please send your queries to one of our mailing lists -- galaxy-dev or galaxy-user -- rather than to individuals. That way, more people can see and help you with your questions, you'll likely get a more timely response, and the mailing list archive can serve as a useful repository. Your question about GFF to BED is best addressed to galaxy-user (b/c it's about usage of Galaxy), and your question about Tophat is best addressed to galaxy-dev (b/c it's about setting up a local Galaxy instance). Hence, I've cc'd both lists. Ok, on to your questions: > Maybe i can use the tophat anlone. We've got a Galaxy wrapper ready for Tophat version 1.1+ ; it will likely be available in galaxy-central this week. Once you update your version of Galaxy, you'll be able to run Tophat in Galaxy. > But now i have a another question for giff to bed function. > i want to use gff-to-bed function after i run it, it seems that i got the results : > > empty, format: bed, database: ? > Info: 0 lines converted to BED. Skipped 74166 blank/comment/invalid lines starting with line #1. > > i am using gff3 file.and it is the first few lines. > > ##gff-version 3 > ##genome-build MSU Rice Genome Annotation Project osa1r6 > ##species Oryza sativa spp japonica cv Nipponbare > ##sequence-region Chr1 1 43268879 > > Chr1 MSU_osa1r6 gene 1903 9817 . + . ID=13101.t00001;Name=TBC%20domain%20containing%20protein%2C%20expressed;Alias=LOC_Os01g01010 > Chr1 MSU_osa1r6 mRNA 1903 9817 . + . ID=13101.m00001;Parent=13101.t00001;Alias=LOC_Os01g01010.1 > > Chr1 MSU_osa1r6 five_prime_UTR 1903 2268 . + . Parent=13101.m00001 > Chr1 MSU_osa1r6 five_prime_UTR 2354 2448 . + . Parent=13101.m00001 This appears to be a valid GFF3 file and should work fine. We'll look into this and get back to you. Thanks, J.

1 1

How to run a pipeline on many data sets ?
by Jean-François Dufayard 04 Nov '10

04 Nov '10

Dear Galaxy users, I would like to do a quite simple operation, in theory: I've configured a Galaxy pipeline on a local Galaxy server (installed in a Sun Grid Engine cluster), and I would like to run it on several datasets (several thousands, in a directory) and get result files in another directory. With the web interface, using libraries or not, I didn't found any solution. Does a simple solution exist ? Or anybody have experienced the same problem ? Sincerely yours, -- Jean-François Dufayard Research engineer - ARCAD project CIRAD - Montpellier - France

3 3