suggestion for multithreading
by Louise-Amélie Schmitt
Hello everyone,
I'm using TORQUE with Galaxy, and we noticed that if a tool is
multithreaded, the number of needed cores is not communicated to pbs,
leading to job crashes if the required resources are not available when
the job is submitted.
Therefore I modified a little the code as follows in
lib/galaxy/jobs/runners/pbs.py
256 # define PBS job options
257 attrs.append( dict( name = pbs.ATTR_N, value = str( "%s_%s_%
s" % ( job_wrapper.job_id, job_wrapper.tool.id, job_wrapper.user ) ) ) )
258 mt_file = open('tool-data/multithreading.csv', 'r')
259 for l in mt_file:
260 l = string.split(l)
261 if ( l[0] == job_wrapper.tool.id ):
262 attrs.append( dict( name = pbs.ATTR_l,
resource = 'nodes', value = '1:ppn='+str(l[1]) ) )
263 attrs.append( dict( name = pbs.ATTR_l,
resource = 'mem', value = str(l[2]) ) )
264 break
265 mt_file.close()
266 job_attrs = pbs.new_attropl( len( attrs ) +
len( pbs_options ) )
(sorry it didn't come out very well due to line breaking)
The csv file contains a list of the multithreaded tools, each line
containing:
<tool id>\t<number of threads>\t<memory needed>\n
And it works fine, the jobs wait for their turn properly, but
information is duplicated. Perhaps there would be a way to include
something similar in galaxy's original code (if it is not already the
case, I may not be up-to-date) without duplicating data.
I hope that helps :)
Best regards,
L-A
11 years, 6 months
Mosaik with Paired Reads
by John David Osborne
Hello,
I'm trying to run Mosaik on our galaxy instance on Ilumina paired reads. However when I selected "paired reads" and Ilumina as an input option, I can still only select one of the two fastq files as input. No 2nd file selector appears like with bwa, bowtie, etc...
Can anybody tell me what is going on - is this a known issue?
-John
11 years, 6 months
keeping aditional data in the aaChanges tool
by Ximena Bonilla
Dear Galaxy staff,
I have recently started using your tool and it has been really helpful,
thank you!
When using Human Genome Variation, aaChanges, I would like to keep some
extra lines in the output file from either of the input files. In the tool
description it says I should be able to keep them:
"...chromosome, start, and end position as well as the SNP. The SNP can be
given using ambiguous-nucleotide symbols or a list of two to four alleles
separated by '/'. *Any other columns in the first input file will not be
used but will be kept for the output*. The second input file contains..."
However, I haven't found a way of actually have them in the output file.
What am I missing/doing incorrectly?
What I've been trying to keep by the way is rs IDs or Ensembl gene IDs.
Thank you in advance for your answer.
Kind regards,
Ximena
11 years, 6 months
Fwd: Workflows with conditional statements
by Florent Angly
I filed an enhancement report since if the workflow conditional facility
does not appear to exist in Galaxy:
https://bitbucket.org/galaxy/galaxy-central/issue/547/conditional-workflo...
Best,
Florent
-------- Original Message --------
Subject: Workflows with conditional statements
Date: Wed, 18 May 2011 10:31:21 +1000
From: Florent Angly <florent.angly(a)gmail.com>
To: galaxy-user(a)lists.bx.psu.edu <galaxy-user(a)lists.bx.psu.edu>
Hi all,
I was wondering if there is a way to put conditional statements in a
Galaxy workflow.
This would be useful, for example, in the case of a workflow that has an
optional advanced option that the user can click. This advanced option
would add some extra steps to the data processing.
Another example of how this could be useful is if inside a workflow, the
data needs to be processed differently based on the results of previous
workflow steps. Say, you have a worflow that takes some sequences, and
calculate their average length. Using a conditional statement, the
workflow would put the data through DeBruijn assembler if the reads are
small, but through a traditional Overlap-Layout-Consensus assembler if
the reads are long.
Are conditional statements possible in Galaxy workflows and I just don't
know how to use them?
Best,
Florent
11 years, 6 months
RNA-seq Galaxy workflow for PE barcoded samples?
by Whyte, Jeffrey
Hello,
I posted to the seqanswers forum, but have not received any feedback. I am working with RNA-seq Illumina data files in Galaxy (http://main.g2.bx.psu.edu/). The two files are 100bp paired-end reads, multiplexed with barcoding to distinguish samples. The barcodes are the first four bases of the sequences in the s_7_1_sequence.txt file.
Would the following Galaxy workflow be correct?
1. Upload both s_7_1_sequence.txt and s_7_2_sequence.txt to Galaxy with the reference genome selected
2. Run NGS: QC and manipulation --> FASTQ Groomer on each file to convert to Sanger FASTQ
3. Run NGS: QC and manipulation --> FASTQ joiner to combine the data from the two files
4. Run FASTX-TOOLKIT FOR FASTQ DATA --> Barcode Splitter to generate separate FASTQ files for each barcode group
5. Run NGS: RNA Analysis --> Tophat to map the reads from each group to the reference genome
The problem I am having is that if I select paired-end for the library in Tophat, it requests two FASTQ files. Would I have to use FASTQ Splitter to separate the joined FASTQ files? If there is a more standard way to handle these types of barcoded files, I would appreciate hearing about this workflow.
Thanks very much in advance,
jjw
P.S. Galaxy is an incredibly useful resource. Thanks!
11 years, 6 months
Workflow API
by Simon Lank
Hi Again Galaxy team.
I'm attempting to use the new workflow API, which I understand is still in
development. I created a test workflow with a single input, and was able to
use 'example_watch_folder.py' to successfully execute it as a history and
get the output to a specified location.
My question is how I can modify the script to accept multiple inputs (i.e.
how do I define which files in the input folder I want to be each input) and
if there's a way to specify runtime parameters. For instance, the workflow I
want to execute has a filter step on a tabular input item as one of the
later steps which needs to be defined at runtime. How would I specify this
in the 'watch_folder.py' parameters? or is this not possible yet?
FYI, I know perl but not python, so ultimately I want to wrap the python
scripts into a larger perl script to execute recursive workflows.
Thanks for all your help. Galaxy is an amazing tool and the workflow API is
a fantastic improvement.
Simon
Simon Lank
Research Specialist
O'Connor Lab, WNPRC
555 Science Dr. Madison WI
(608) 265-3389
11 years, 6 months
Question regarding quality filtering of 454 amplicons
by Jackie Lighten
Hi,
I have a question for you guys regarding quality filtering.
I have a data set of double MID tagged 454 amplicons, from which I wish to
select high quality sequences above Q20.
The 454 quality filtering system seems to work differently from that given
for the Illumina sequencing i.e. 454 filtering takes high quality segments,
while Illumina (FASTQ) can select high quality full reads based on certain
parameters.
OK, so I know that the total length of my amplicon, including primers and
barcodes is around 260bp. If I then set the 454 quality filtering tool to
extract contiguous high quality sequence of >260, it gives me back around
45% of my raw data as hitting this criterion i.e. All 260bp are above Q20. I
don¹t necessarily need this high stringency as most bases may not be
informative.
But if I convert my 454 data to FASTQ format and then run the Illumina
filtering system which also allows me to set the number of bases allowed to
deviate from the Q20 criteria, I get back over 90% of my data (allowing 10bp
to deviate from Q20).
I then need to go ahead and convert back to 454 format.
Can you tell me if this is OK?
Will I loose /confuse information somewhere along these conversions?
It seems that if I do this, my barcodes are removed, as amplicons do not
sort properly when I parse them through my barcode filtering program.
Does anyone know of a program to filter 454 data based on average sequence
quality score, which doesn¹t involve Linux and the Roche off instrument
program (I have no experience in Linux! )
Thanks!
--
Jack Lighten,
Ph.D. Candidate,
Bentzen Lab,
Room 6078,
Department of Biology,
Dalhousie University,
Halifax, NS, B3H 4J1
Canada
Office:(902) 494-1398
Email: Jackie.Lighten(a)Dal.Ca
Profile: www.marinebiodiversity.ca/CHONe/Members/lightenj/profile/bio
11 years, 7 months
error occurred when converting the SOLID output to fastq
by Jia-Xing Yue
Hi, when I converted the solid output to fastq, the galaxy always complained the following error. Does anybody know why>>
-------------
An error occurred running this job:Traceback (most recent call last):
File "/galaxy/home/g2main/galaxy_main/tools/next_gen_conversion/solid2fastq.py", line 207, in <module>
main()
File "/galaxy/home/g2main/galaxy_main/tools/next_gen_conversion/solid2fastq.py", line 188, in main
Traceback (most recent call last):
File "/galaxy/home/g2main/galaxy_main/tools/next_gen_conversion/solid2fastq.py", line 207, in
main()
File "/galaxy/home/g2main/galaxy_main/tools/next_gen_conversion/solid2fastq.py", line 188, in main
merge_reads_qual( fr, fq, con, trim_name=options.trim_name, out='db', double_encode=options.de, trim_first_base=options.trim_first_base, min_qual=options.min_qual, table_name="f3" )
File "/galaxy/home/g2main/galaxy_main/tools/next_gen_conversion/solid2fastq.py", line 83, in merge_reads_qual
cursor.execute('insert into %s values("%s","%s","%s")' % (table_name, defline, lines[0], qual ) )
sqlite3.OperationalError: near "?3": syntax error
-----------
Thanks!
Jia-Xing
--
Jia-Xing Yue
Graduate Student
Ecology & Evolutionary Bio. -MS 170
Rice University
6100 Main Street
Houston TX 77005
Phone: 1-832-360-6228
E-mail: yjx(a)rice.edu
Blog: http://bestrok.blogspot.com/
11 years, 7 months
pileup analysis
by Andrea Edwards
Hello
I was wondering if there was anything available within galaxy that would
let you do the following with pileup files:
1) filter for homozygous SNVs (i.e. that do not contain the reference
genome allele in the genotype)
2) compare the pileup files for 2 (or more) individuals to find SNVs
unique to each individual and to further limit this to homozygous SNPs
unique to each individual
3) compare the pileup files for 2 (or more) individuals to find shared
SNVs and to further limit to this to shared SNVs where the individuals
have different alleles (rare as would assume triallelic snv) and then
group the individuals according to the allele for each SNV.
I could only see the filter pileup tool but nothing for comparing pileup
files
thanks
11 years, 7 months
MACS
by Sher, Falak
while using MSCS tool from Galaxy I get following mesg. with BED file:
"Treatment tags and Control tags are uneven! FDR may be wrong"
Any suggestion to fix it..
I don't understand its implication, my data is form single illumina chIP-Seq experiment. I use Bowtie from Galaxy for mapping.
F
11 years, 7 months