thank you for your answer!
I have used the Add or Replace group tool and it worked pretty well, so
that I could use the FreeBayes tool with no problem!
Now I have another question: I have been pre-processing my data with the
NGS: GATK tools according to their Best Practices and I am ready for SNP
calling. I have read the Unified Genotyper documentation and, since I am
working with bacterial genome sequences, I would need to set the
-sample-ploidy argument to 1 (default 2). I cannot find this option in
the Galaxy version of this tool, not even in the advanced options. How
can I do that?
Thank you very much!
> Message: 3
> Date: Fri, 27 Sep 2013 14:02:50 -0700
> From: Jennifer Jackson <jen(a)bx.psu.edu>
> To: garzetti <garzetti(a)mvp.uni-muenchen.de>
> Cc: galaxy-user(a)bx.psu.edu
> Subject: Re: [galaxy-user] SNP calling problems
> Message-ID: <5245F27A.7020200(a)bx.psu.edu>
> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
> Hi Debora,
> Sorry to hear that you are having problems. We can help get you going
> again! Please see below:
> On 9/26/13 7:20 AM, garzetti wrote:
>> Dear all,
>> I have been looking for an answer to my problem in all the Galaxy
>> Support resources but with no success. I am sorry if this topic has
>> been already discussed!
>> So, I am analyzing MiSeq data on the main Galaxy.
>> I have Fastq files from 4 paired-end samples. After having checked the
>> quality with FastQC and groomed them, I have performed a BWA mapping,
>> filtered the results and converted the SAM to BAM files (for each
>> sample separately). I have then called SNPs with Freebayes and
>> SAMtools, encountering problems in both cases.
>> 1) SAMtools: if I run the Generate pileup tool, then the Filter pileup
>> doesn't recognize any valid format in the files I have in my History
>> and I cannot go on with the analysis. Why is that? What can I do?
> Make sure that the output format is set as "pileup" and the tool will
> accept the input. Click on the pencil icon to make the datatype
> assignment change.
> Note that Mpileup has an option to produce .bcf format, and that is not
> the same as pileup. If you have selected that type of output, then
> either re-run the tool with options that create pileup format, or
> convert bcf -> vcf and use one of the tools that work with vcf format to
> work with your data downstream from there.
>> 2) I have performed variant calling with Freebayes on single BAM files
>> and on one merged BAM files from all my four BWA mapping files. In all
>> cases, the last column is "unknown", while it should be the name of my
>> sample. This is not a big deal for the single vcf files, but from the
>> merged BAM file, I cannot discriminate from which sample the SNPs were
>> detected. I think there is a problem in the BAM files which are not
>> properly indexed. Also Freebayes needs an RG tag.
>> Is there a tool in Galaxy I can use to index BAM files, adding the RG
> The tool " NGS: Picard (beta) -> Add or Replace Groups" can be used to
> annotate SAM/BAM files. This tool can be a bit picky about formats, so
> just watch for that if you get an error.
> /_Quick tip:_/ You can click on the bug icon on failed datasets to see
> the complete error message and it will often tell you exactly what is
> wrong so that you can correct it (this doesn't automatically submit a
> bug, which is good to know when you are in a hurry at night or on
> weekends or just want to troubleshoot yourself). You can use this on any
> error dataset to get more information if the dataset's "i" info button's
> stderr/stdout links or attributes "Info" field does not provide enough
> details. => This functions on servers that have bug reporting enabled
> (the public Main server does, and this is straightforward to configure
> on local/cloud instances, including your own, even if you use one for
> small local file manipulations or file backup/storage (very handy & key
> file backups are always a good idea, when doing analysis in general,
> anywhere). See the Admin wiki section for more.
> Going forward, there is a short screencast about the Learning resources
> in Galaxy here in a Page. It will be uploaded to Vimeo sometime in the
> next 24 hrs, and will be likely updated to include the very latest as
> the infrastructure updates on Main settle out in the next weeks or so,
> but for now here is the link: Click on the "Learning Resources" graphic
> to launch the quick tour:
> Galaxy team's Vimeo account: http://vimeo.com/channels/581769
> We are uploading all of our vids, old & new, right now and over next few
> days. We really like and hope our user's do too and follow along. The
> public Main server will have direct links to this content, in the center
> home page, soon as part of the "New & Improved" Galaxy experience! I
> won't give an ETA, as this is in progress, but can hint that soon ==
> expected very soon. (!)
> Good luck and let us know if you need more help,
> Galaxy team
>> I hope someone can help me!
>> Thank you very much!
Debora Garzetti, PhD Student
Max von Pettenkofer-Institute, LMU
Phone: +49 (0)89 2180 72915
I don't know why I still have this problem..
I have run tophat2 with different dataset, sometimes it goes well but sometime I have this error.
I run only one job at a time on a virtual machine with 8G memory without using galaxy plateform. I tried --no-coverage-search option but it changes nothing.
De : Delong, Zhou
Envoyé : 27 août 2013 9:36
À : galaxy-user(a)bx.psu.edu
Objet : Tophat Error: segment-based junction search failed with err
I have run several analysis with Tophat 2 on my local instance of galaxy and I get this error for all of them..
segment-based junction search failed with err = 1 or -9
Here is an example of full error report:
Error in tophat:
[2013-08-23 11:56:58] Beginning TopHat run (v2.0.6)
[2013-08-23 11:56:58] Checking for Bowtie
Bowtie version: 18.104.22.168
[2013-08-23 11:56:58] Checking for Samtools
Samtools version: 0.1.18.0
[2013-08-23 11:56:58] Checking for Bowtie index files
[2013-08-23 11:56:58] Checking for reference FASTA file
[2013-08-23 11:56:58] Generating SAM header for /usr/local/data/bowtie2/hg19/hg19
quality scale: phred33 (default)
[2013-08-23 11:58:04] Preparing reads
left reads: min. length=50, max. length=50, 145339247 kept reads (34946 discarded)
right reads: min. length=50, max. length=50, 145340153 kept reads (34040 discarded)
[2013-08-23 14:16:21] Mapping left_kept_reads to genome hg19 with Bowtie2
[2013-08-24 01:04:37] Mapping left_kept_reads_seg1 to genome hg19 with Bowtie2 (1/2)
[2013-08-24 03:38:22] Mapping left_kept_reads_seg2 to genome hg19 with Bowtie2 (2/2)
[2013-08-24 05:29:58] Mapping right_kept_reads to genome hg19 with Bowtie2
[2013-08-24 19:50:22] Mapping right_kept_reads_seg1 to genome hg19 with Bowtie2 (1/2)
[2013-08-24 22:36:38] Mapping right_kept_reads_seg2 to genome hg19 with Bowtie2 (2/2)
[2013-08-25 01:40:37] Searching for junctions via segment mapping
Coverage-search algorithm is turned on, making this step very slow
Please try running TopHat again with the option (--no-coverage-search) if this step takes too much time or memory.
Error: segment-based junction search failed with err =-9
Collecting potential splice sites in islands
cp: cannot stat `/home/galaxy/galaxy-dist/database/job_working_directory/000/515/tophat_out/deletions.bed': No such file or directory
cp: cannot stat `/home/galaxy/galaxy-dist/database/job_working_directory/000/515/tophat_out/insertions.bed': No such file or directory
I did some research on the internet and it seems to be a memory problem to me, is there any solution other than rerun these jobs on a more powerful machine?
And why has Bowtie/Tophat discard different numbers of reads? What will be the impact? Does it means that if I don't have exact matches between the paired end input, it is still be possible to run the job?
NOTE: please apply as soon as possible, the period for applications is
exceptionally short due to operational reasons
Automated and reproducible analysis of NGS data
IMPORTANT DATES for ARANGS13
Deadline for applications: October 8th 2013
Notification of acceptance dates: October 15th 2013
Course date: October 21st - October 24th 2013
Next generation sequencing (NGS) technologies for DNA have resulted in
a yet bigger deluge of data. Researchers are learning that analysing
such data sets is becoming the bottleneck in their work. In many
cases, several steps in these analyses are fairly generic (e.g.
quality control filtering, alignment to reference sequences, typing)
so that off-the-shelf pipelines can be applied. In other cases, novel
research approaches require development of new analysis pipelines.
Either way, all analysis steps should be repeatable and any changes
made to the data (e.g. renaming, annotation, alignment) should be
recorded so that the provenance of the results is clear and inferences
are reproducible. In this brief workshop we will establish several
best practices of reproducibility and provenance recording in the
(comparative) analysis of data obtained by NGS. In doing so we will
encounter the commonly used technologies that enable these best
practices by working through use cases that illustrate the underlying
principles. Building on the basis of workflow development, we will
further illustrate how custom-built workflows can be manipulated using
graphical platforms (e.g. Galaxy, Taverna, etc.).
Standardized project organization
Projects 'runnable' without user intervention
No loss of data, metadata, parameters or source code through versioning
Sharing of scripts and workflows
Next generation sequencing platforms
File formats (e.g. FASTQ, SAM/BAM, GFF3)
Command-line executables, command line scripting and batching
High-level programming with domain-specific toolkits
Revision control systems
Workflow environments (both visual and command line)
Phylogenetic placement of metagenomic data
Typing of pathogens
Comparative analysis of multicellular genomic data
Post-assembly: handling richly annotated genomes
More information, including application instructions, available at
Instituto Gulbenkian de Ciência
Tel +351 21 4407912
Can anyone tell me if the ability to randomly sample a sam or bam file
(view -s) is available via Galaxy samtools? I can't find it but it might
be an option that I am missing.
I am trying to launch Galaxy locally but I'm getting the following error:
tkx417:galaxy pascarellagiovanni$ sh run.sh
Traceback (most recent call last):
File "./scripts/paster.py", line 33, in <module>
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/galaxy/util/pastescript/serve.py", line 1049, in run
invoke(command, command_name, options, args[1:])
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/galaxy/util/pastescript/serve.py", line 1055, in invoke
exit_code = runner.run(args)
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/galaxy/util/pastescript/serve.py", line 220, in run
result = self.command()
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/galaxy/util/pastescript/serve.py", line 643, in command
app = loadapp( app_spec, name=app_name, relative_to=base, global_conf=vars)
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/galaxy/util/pastescript/loadwsgi.py", line 350, in loadapp
return loadobj(APP, uri, name=name, **kw)
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/galaxy/util/pastescript/loadwsgi.py", line 374, in loadobj
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/galaxy/util/pastescript/loadwsgi.py", line 399, in loadcontext
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/galaxy/util/pastescript/loadwsgi.py", line 423, in _loadconfig
return loader.get_context(object_type, name, global_conf)
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/galaxy/util/pastescript/loadwsgi.py", line 561, in get_context
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/galaxy/util/pastescript/loadwsgi.py", line 620, in _context_from_explicit
value = import_string(found_expr)
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/galaxy/util/pastescript/loadwsgi.py", line 125, in import_string
return pkg_resources.EntryPoint.parse("x=" + s).load(False)
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/pkg_resources.py", line 1954, in load
entry = __import__(self.module_name, globals(),globals(), ['__name__'])
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/galaxy/web/__init__.py", line 4, in <module>
from framework import expose
File "/Users/pascarellagiovanni/Desktop/galaxy/lib/galaxy/web/framework/__init__.py", line 40, in <module>
from babel.support import Translations
File "/Users/pascarellagiovanni/Desktop/galaxy/eggs/Babel-0.9.4-py2.7.egg/babel/support.py", line 29, in <module>
from babel.dates import format_date, format_datetime, format_time, LC_TIME
File "/Users/pascarellagiovanni/Desktop/galaxy/eggs/Babel-0.9.4-py2.7.egg/babel/dates.py", line 34, in <module>
LC_TIME = default_locale('LC_TIME')
File "/Users/pascarellagiovanni/Desktop/galaxy/eggs/Babel-0.9.4-py2.7.egg/babel/core.py", line 642, in default_locale
return '_'.join(filter(None, parse_locale(locale)))
File "/Users/pascarellagiovanni/Desktop/galaxy/eggs/Babel-0.9.4-py2.7.egg/babel/core.py", line 763, in parse_locale
raise ValueError('expected only letters, got %r' % lang)
ValueError: expected only letters, got 'utf-8'
Anybody knows what this is about?
The NGS queue was believed to be moving yesterday (incorrectly by me),
and is now under review again. The grey queued jobs will eventually
execute, so I wouldn't delete them quite yet, but jobs are not
processing at this time. We will update the banner on the public Main
server if this is extended.
For high-priority work, a move to an alternate Galaxy solution is an
option. A cloud Galaxy is one good choice - this is a sure thing but has
associated costs. Other public servers are also potential choices - each
has different tools available and other requirements - so check out
these and see what may work. Links for you to review are here:
Good luck, and thanks for the patience during our upgrade,
On 10/1/13 1:53 AM, Boaz Shaanan wrote:
> Thanks Jennifer, My job is still in the queue, so I'll keep it that way. From other people's complaints, it looks as if it's the mapping jobs that mostly have problems (mine is one like that too - a Lastz run). Any particular reason or just the general server performance issue.
> Boaz Shaanan, Ph.D.
> Dept. of Life Sciences
> Ben-Gurion University of the Negev
> Beer-Sheva 84105
> E-mail: bshaanan(a)bgu.ac.il
> Phone: 972-8-647-2220 Skype: boaz.shaanan
> Fax: 972-8-647-2992 or 972-8-646-1710
> From: Jennifer Jackson [jen(a)bx.psu.edu]
> Sent: Tuesday, October 01, 2013 12:27 AM
> To: בעז שאנן
> Cc: galaxy-user(a)lists.bx.psu.edu
> Subject: Re: [galaxy-user] main galaxy server is down?
> Hello Boaz,
> Performance should be improved by now. Please allow your jobs to run if
> still queued. If any error due to fileserver or cluster issues, please
> simply re-run. Our apologies for these inconveniences, improvements are
> due very soon!
> Galaxy team
> On 9/27/13 2:37 PM, Boaz Shaanan wrote:
>> Is the main galaxy server down? Or very sloooow? I have a Lastz job (not too demanding and one that has been run several times before) waiting for a long time already.
>> Boaz Shaanan, Ph.D.
>> Dept. of Life Sciences
>> Ben-Gurion University of the Negev
>> Beer-Sheva 84105
>> E-mail: bshaanan(a)bgu.ac.il
>> Phone: 972-8-647-2220 Skype: boaz.shaanan
>> Fax: 972-8-647-2992 or 972-8-646-1710
>> The Galaxy User list should be used for the discussion of
>> Galaxy analysis and other features on the public server
>> at usegalaxy.org. Please keep all replies on the list by
>> using "reply all" in your mail client. For discussion of
>> local Galaxy instances and the Galaxy source code, please
>> use the Galaxy Development list:
>> To manage your subscriptions to this and other Galaxy lists,
>> please use the interface at:
>> To search Galaxy mailing lists use the unified search at:
> Jennifer Hillman-Jackson
I am having problems accessing my galaxy account. I forgot my password and asked for it to be reset. When the new password was sent to me, it didn't work when I tried it. I've resetted 4x and each time the new password hasn't worked to log. Hope you can solve the problem.
London, NW7 1AA
Tel: 44 (0)2088162426
Fax: 44 (0)2088162523
I installed a new copy of galaxy today and then added the bwa_wrappers
tool. After I upload my reference genome and left/right reads I get output
like this each time I try to run bwa for illumina:
The alignment failed.
Error aligning sequence. [bwa_aln] 17bp reads: max_diff = 2
[bwa_aln] 38bp reads: max_diff = 3
[bwa_aln] 64bp reads: max_diff = 4
[bwa_aln] 93bp reads: max_diff = 5
[bwa_aln] 124bp reads: max_diff = 6
[bwa_aln] 157bp reads: max_diff = 7
[bwa_aln] 190bp reads: max_diff = 8
[bwa_aln] 225bp reads: max_diff = 9
[bwt_restore_bwt] fail to open file
/bin/sh: line 1: 11607 Aborted bwa aln -t 4 -I
If I manually do the 'bwa index' command on dataset_4.dat it works but
in the past this seemed to happen automatically. Any clue what's
going on here?