The tool fastq_filter worked on 1M reads, but fails (hangs) on 15M reads. I had to kill the job after the user let it run for a whole day. The debug.txt file containing a python function "fastq_read_pass_filter" is created in the files/000/dataset_xxx_files directory. I am getting no error from the galaxy server.
I wonder what could cause fastq_filter to fail? The fastx equivalent tool works, but it misses all the options of fastq_filter.
I'd be grateful for any hints to help me get fastq_filter to work on large fastq files.
As part of the BLAST+ wrappers I wrote a short Python script to divide
a FASTA file into those records with or without an ID found in column
of a tabular file:
While working on the wrapper I ran into a problem: In general a
tabular file might contain sequence IDs in any column. When I tried
using a data_column parameter it does not work if the tabular file has
no rows in it. I think this is actually quite a common situation (e.g.
sequences with no BLAST hits, or any tabular file after a strict
filter). For BLAST tabular output only columns 1 and 2 contain
sequence identifiers (query and subject), so I used a select parameter
The functionality to filter a FASTA file using IDs in a tabular file
is actually very general. As an example work flow, I want to be able
to take a FASTA file of proteins and get a FASTA file of those
proteins with prediction transmembrane helices.
(1) Upload FASTA file (already in Galaxy)
(2) Run TMHMM to get tabular file (see other thread for wrapper)
(3) Filter tabular file to get just positive results (already in Galaxy)
(4) Filter FASTA file using those IDs found in new tabular file (above script)
I therefore think it might make more sense for this script to live
under tools/fasta_tools - does that seem reasonable? I would also need
to generalise the help text etc - but how should the data_column
problem be addressed?
Hi all, forget my previous mail, my mistake (it was intended for the samtools-devel mailing list).
BTW, I would exploit this mail to ask how to reset the password for the community shed...
Cogentech - Consortium for Genomic Technologies
via adamello, 16
Hi there, I guess most of you have seen this:
is it time to resume an old thread (about compression strategies)?
Cogentech - Consortium for Genomic Technologies
via adamello, 16
On Tue, Oct 12, 2010 at 10:43 AM, Freddy <freddy.debree(a)wur.nl> wrote:
> Hi Peter,
> Running R from xml is not so straight forward to me.
> Is the interpreter "bash" or is it "R"?
> If "bash"
> then it should be running "R --vanilla --slave input < Rscript >
> if "R"
> then it should be running ? smthg like "source(Rscript)"
> Hope you can help...
Consider a simple R script, hello_world.R like this:
Then in Galaxy I think you'd need the default interpreter
("bash") and the command to be:
R --vanilla --file=hello_world.R --args "Args go here"
I'm assuming your script will write useful output to a
file, and stdout can be discarded.
However, what I originally had in mind was setting the
hash bang and executable bit with chmod so that you
can do this at the command line:
./hello_world.R "Args go here"
Once that works, calling it from Galaxy should be trivial.
I think one way to do this is with the little r package:
"N" in the file "All SNPs in Personal Genomes" implies lack of information. The file has SNP data from the Southern African genomes as well as a lot of other published personal genomes. For the other personal genomes, we did not have the consensus calls at all locations (we just have the SNP locations and calls) and hence that lack of information is depicted by the base "N". For the Southern African genomes, the lack of information could be because of the method used e.g. genotyping does not give the consensus call on every location, or because we had no coverage on that location.
In the future, please send Galaxy questions / correspondence to one of the Galaxy mail lists ( galaxy-dev(a)bx.psu.edu, galaxy-user(a)bx.psu.edu, galaxy-bugs(a)bx.psu.edu ) instead of my personal email address.
On Oct 14, 2010, at 5:00 AM, Oskar Hallatschek wrote:
> Dear Greg,
> could you please let me know whether an entry "N" in the file "All SNPs in Personal Genomes" refers to an unidentifiable base,
> and why there are so many such entries in the displayed Genome.
> many thanks,
> Oskar Hallatschek
> MPI for Dynamics and Self-Organization
> Biological Physics and Evolutionary Dynamics
> Bunsenstr. 10, D 37073 Goettingen
> phone: +49-551-5176-670
> fax: +49-551-5176-669
> e-mail: oskar.hallatschek(a)ds.mpg.de
Greg Von Kuster
Galaxy Development Team
> No worries
> It is through dir server
> Not sure why it should fail on it's own (unless u mean there's a
> default timeout).
> Sure will start up the galaxy instance again to see if it's still
> running the slow upload. And feedback here again?
Are you using PBS, SGE, or another cluster job runner? Otherwise, these
jobs should automatically set to the "error" state upon server startup.
Are you sure that they actually run (at the command line), or are they
just stuck in the "running" state in the library interface?
There was a bug fixed a while back that could be preventing these jobs
from being set to error upon server startup. If you're running an older
revision of Galaxy, I would suggest updating.
If you send us which database you're using (just the database type as in
SQLite, Postgres, or MySQL, not the database itself), I can send you
appropriate SQL to fix the job state. Unfortunately there is no
interface in Galaxy to correct these.
> Sent from my iPod
> On 13-Oct-2010, at 1:23 AM, Jennifer Jackson <jen(a)bx.psu.edu> wrote:
> >Hi Kevin,
> >Sorry for misunderstanding your question the first time ... you
> >are asking about a library upload (not a dataset).
> >In this case, you will have to manually change the job state in
> >the database to 'error' to get Galaxy to stop recovering it at
> >start up.
> >Our developers are curious whether you are using cluster job
> >runner and a URL or server directory upload though, since
> >otherwise the job should have failed on its own. If there is a
> >case through Galaxy's main tools where it doesn't fail, we'd like
> >to track that down if you have time to help.
> >Thanks again!
> >Galaxy team
> >On 10/4/10 6:34 PM, Kevin Lam wrote:
> >>I have made a mistake of uploading via file browser a large file
> >>the data library function. How do I cancel it as it resumes upload
> >>whenever I start the local instance.
> >>galaxy-user mailing list
> >Jennifer Jackson
> galaxy-user mailing list
For internal production instances, I'd much rather my users report an error/bug to our own internal ticketing system (and if something seems appropriate to feed to the Galaxy Team I can pass it along). I haven't looked at the guts of the error reporting form on the error page, but off hand does anyone know if it's readily possible to plug that into another bug tracking system? (ie does it just send an e-mail?)
Something I expect to find useful in several analysis pipelines is
a Galaxy wrapper for the NCBI BLAST+ tools (or even the old
NCBI "legacy" BLAST tools if such a wrapper exists).
I've been looking over the tools in galaxy-dist and galaxy-central and
the only NCBI BLAST wrapper I can see is for MEGABLAST, under
Is there some more general NCBI BLAST+ wrappers that I have
missed? Or is anyone already working on this?