October 2010 - galaxy-dev - lists.galaxyproject.org

Fastq_filter fails on large files
by Isabelle Phan 18 Oct '10

18 Oct '10

Hello, The tool fastq_filter worked on 1M reads, but fails (hangs) on 15M reads. I had to kill the job after the user let it run for a whole day. The debug.txt file containing a python function "fastq_read_pass_filter" is created in the files/000/dataset_xxx_files directory. I am getting no error from the galaxy server. I wonder what could cause fastq_filter to fail? The fastx equivalent tool works, but it misses all the options of fastq_filter. I'd be grateful for any hints to help me get fastq_filter to work on large fastq files. Thanks Isabelle

2 2

data_column parameter with empty tabular file / FASTA filtering
by Peter 18 Oct '10

18 Oct '10

Hi all, As part of the BLAST+ wrappers I wrote a short Python script to divide a FASTA file into those records with or without an ID found in column of a tabular file: http://bitbucket.org/galaxy/galaxy-central/src/tip/tools/ncbi_blast_plus/bl… http://bitbucket.org/galaxy/galaxy-central/src/tip/tools/ncbi_blast_plus/bl… While working on the wrapper I ran into a problem: In general a tabular file might contain sequence IDs in any column. When I tried using a data_column parameter it does not work if the tabular file has no rows in it. I think this is actually quite a common situation (e.g. sequences with no BLAST hits, or any tabular file after a strict filter). For BLAST tabular output only columns 1 and 2 contain sequence identifiers (query and subject), so I used a select parameter instead: http://bitbucket.org/galaxy/galaxy-central/changeset/aa7f4bdc2eab The functionality to filter a FASTA file using IDs in a tabular file is actually very general. As an example work flow, I want to be able to take a FASTA file of proteins and get a FASTA file of those proteins with prediction transmembrane helices. (1) Upload FASTA file (already in Galaxy) (2) Run TMHMM to get tabular file (see other thread for wrapper) (3) Filter tabular file to get just positive results (already in Galaxy) (4) Filter FASTA file using those IDs found in new tabular file (above script) I therefore think it might make more sense for this script to live under tools/fasta_tools - does that seem reasonable? I would also need to generalise the help text etc - but how should the data_column problem be addressed? Thanks, Peter

1 0

sorry
by Davide Cittaro 15 Oct '10

15 Oct '10

Hi all, forget my previous mail, my mistake (it was intended for the samtools-devel mailing list). BTW, I would exploit this mail to ask how to reset the password for the community shed... d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro(a)ifom-ieo-campus.it */

2 2

Abstract | Data structures and compression algorithms for high-throughput sequencing technologies
by Davide Cittaro 15 Oct '10

15 Oct '10

Hi there, I guess most of you have seen this: http://www.biomedcentral.com/1471-2105/11/514 is it time to resume an old thread (about compression strategies)? d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro(a)ifom-ieo-campus.it */

1 0

Re: [galaxy-dev] rpy or perl
by Peter 14 Oct '10

14 Oct '10

On Tue, Oct 12, 2010 at 10:43 AM, Freddy <freddy.debree(a)wur.nl> wrote: > > > Hi Peter, > > Running R from xml is not so straight forward to me. > Is the interpreter "bash" or is it "R"? > If "bash" > then it should be running "R --vanilla --slave input < Rscript > > output" > if "R" > then it should be running ? smthg like "source(Rscript)" > > Hope you can help... > > Freddy Consider a simple R script, hello_world.R like this: print('Hello world'); print(commandArgs()); Then in Galaxy I think you'd need the default interpreter ("bash") and the command to be: R --vanilla --file=hello_world.R --args "Args go here" I'm assuming your script will write useful output to a file, and stdout can be discarded. However, what I originally had in mind was setting the hash bang and executable bit with chmod so that you can do this at the command line: ./hello_world.R "Args go here" Once that works, calling it from Galaxy should be trivial. I think one way to do this is with the little r package: http://code.google.com/p/littler/ Peter

6 9

Can I add header and footer into Galaxy framework?
by Zhe Chen 14 Oct '10

14 Oct '10

Hi, Can I add a header and footer into Galaxy framework? Or can I embed Galaxy framework into my webstie without changing the look and feel of my website? Thanks

2 1

Re: [galaxy-dev] concerning: "All SNPs in Personal Genomes"
by Greg Von Kuster 14 Oct '10

14 Oct '10

Hello Oskar, "N" in the file "All SNPs in Personal Genomes" implies lack of information. The file has SNP data from the Southern African genomes as well as a lot of other published personal genomes. For the other personal genomes, we did not have the consensus calls at all locations (we just have the SNP locations and calls) and hence that lack of information is depicted by the base "N". For the Southern African genomes, the lack of information could be because of the method used e.g. genotyping does not give the consensus call on every location, or because we had no coverage on that location. In the future, please send Galaxy questions / correspondence to one of the Galaxy mail lists ( galaxy-dev(a)bx.psu.edu, galaxy-user(a)bx.psu.edu, galaxy-bugs(a)bx.psu.edu ) instead of my personal email address. Thanks! On Oct 14, 2010, at 5:00 AM, Oskar Hallatschek wrote: > Dear Greg, > > could you please let me know whether an entry "N" in the file "All SNPs in Personal Genomes" refers to an unidentifiable base, > and why there are so many such entries in the displayed Genome. > > many thanks, > Oskar > -- > Oskar Hallatschek > MPI for Dynamics and Self-Organization > Biological Physics and Evolutionary Dynamics > Bunsenstr. 10, D 37073 Goettingen > phone: +49-551-5176-670 > fax: +49-551-5176-669 > e-mail: oskar.hallatschek(a)ds.mpg.de > http://www.evo.ds.mpg.de/ > Greg Von Kuster Galaxy Development Team greg(a)bx.psu.edu

1 0

Re: [galaxy-dev] [galaxy-user] how to cancel a upload to data library?
by Nate Coraor 13 Oct '10

13 Oct '10

Kevin wrote: > No worries > It is through dir server > Not sure why it should fail on it's own (unless u mean there's a > default timeout). > Sure will start up the galaxy instance again to see if it's still > running the slow upload. And feedback here again? Hi Kevin, Are you using PBS, SGE, or another cluster job runner? Otherwise, these jobs should automatically set to the "error" state upon server startup. Are you sure that they actually run (at the command line), or are they just stuck in the "running" state in the library interface? There was a bug fixed a while back that could be preventing these jobs from being set to error upon server startup. If you're running an older revision of Galaxy, I would suggest updating. If you send us which database you're using (just the database type as in SQLite, Postgres, or MySQL, not the database itself), I can send you appropriate SQL to fix the job state. Unfortunately there is no interface in Galaxy to correct these. --nate > > Sent from my iPod > > On 13-Oct-2010, at 1:23 AM, Jennifer Jackson <jen(a)bx.psu.edu> wrote: > > >Hi Kevin, > > > >Sorry for misunderstanding your question the first time ... you > >are asking about a library upload (not a dataset). > > > >In this case, you will have to manually change the job state in > >the database to 'error' to get Galaxy to stop recovering it at > >start up. > > > >Our developers are curious whether you are using cluster job > >runner and a URL or server directory upload though, since > >otherwise the job should have failed on its own. If there is a > >case through Galaxy's main tools where it doesn't fail, we'd like > >to track that down if you have time to help. > > > >Thanks again! > > > >Jen > >Galaxy team > > > > > >On 10/4/10 6:34 PM, Kevin Lam wrote: > >>Hi, > >>I have made a mistake of uploading via file browser a large file > >>through > >>the data library function. How do I cancel it as it resumes upload > >>whenever I start the local instance. > >> > >>Cheers > >>Kevin > >> > >> > >> > >>_______________________________________________ > >>galaxy-user mailing list > >>galaxy-user(a)lists.bx.psu.edu > >>http://lists.bx.psu.edu/listinfo/galaxy-user > > > >-- > >Jennifer Jackson > >http://usegalaxy.org > _______________________________________________ > galaxy-user mailing list > galaxy-user(a)lists.bx.psu.edu > http://lists.bx.psu.edu/listinfo/galaxy-user

1 0

Report this error to the !Galaxy Team
by Andrew Stewart 13 Oct '10

13 Oct '10

For internal production instances, I'd much rather my users report an error/bug to our own internal ticketing system (and if something seems appropriate to feed to the Galaxy Team I can pass it along). I haven't looked at the guts of the error reporting form on the error page, but off hand does anyone know if it's readily possible to plug that into another bug tracking system? (ie does it just send an e-mail?)

2 2

NCBI BLAST+ wrappers in Galaxy?
by Peter 13 Oct '10

13 Oct '10

Hi all, Something I expect to find useful in several analysis pipelines is a Galaxy wrapper for the NCBI BLAST+ tools (or even the old NCBI "legacy" BLAST tools if such a wrapper exists). I've been looking over the tools in galaxy-dist and galaxy-central and the only NCBI BLAST wrapper I can see is for MEGABLAST, under tools/metag_tools. Is there some more general NCBI BLAST+ wrappers that I have missed? Or is anyone already working on this? Thanks, Peter

3 17