I have a list of genomic regions with some variants and would like to study
the correlation between theses variants and epigenomics marks such as
>From Encode download page, i got some files corresponding to peaks of these
hsitone modifications and would like to know if there is a way to create a
pipeline using galaxy to map my variants, depending on genomic regions to
the information I have from the histone modification peaks.
Is there someone who can point me to a step by step to do things to start
using Galaxy ?
this should be simple but it is not..forgive the newbie question.
i am doing chip-seq. bowtie>sam filter for mapped reads>MACS.
i want to create a wiggle file that displays in ucsc, but when i choose the
"WIG" option on macs, and then try to show it in UCSC, it treats each line of
the created WIG file as a separate track, and obviously does not show it as a
is there a wiki page somewhere that can give me the basics? or can someone
point me in the right direction?
I have a column c1 that has entries like "GXP_297346(PVALB/human)".
I'm trying to use Text Manipulation > Compute to strip off the "(...)" portion, leaving only the accession (which can vary in length).
I have tried a variety of things that work in my python command line, but fail here, for example:
This gets mangled:
An error occurred running this job: Expression "c1__ob__1:c1.find("(")__cb__" likely invalid.
An error occurred running this job: Expression "c1.split("(")__ob__0__cb__" likely invalid.
Please help. This is driving me crazy.
Searching the list, I find only
http://gmod.827538.n3.nabble.com/inputs-sanitization-tt2664336.html#a2664911 "Inputs sanitization" which seems to indicate this is a global mapper that can only be disabled with dire security consequences.
http://gmod.827538.n3.nabble.com/substring-sequence-on-coordinate-in-colu... "substring sequence on coordinate in columns" which doesn't ever answer the question about how to get compute to work.
I'm using TORQUE with Galaxy, and we noticed that if a tool is
multithreaded, the number of needed cores is not communicated to pbs,
leading to job crashes if the required resources are not available when
the job is submitted.
Therefore I modified a little the code as follows in
256 # define PBS job options
257 attrs.append( dict( name = pbs.ATTR_N, value = str( "%s_%s_%
s" % ( job_wrapper.job_id, job_wrapper.tool.id, job_wrapper.user ) ) ) )
258 mt_file = open('tool-data/multithreading.csv', 'r')
259 for l in mt_file:
260 l = string.split(l)
261 if ( l == job_wrapper.tool.id ):
262 attrs.append( dict( name = pbs.ATTR_l,
resource = 'nodes', value = '1:ppn='+str(l) ) )
263 attrs.append( dict( name = pbs.ATTR_l,
resource = 'mem', value = str(l) ) )
266 job_attrs = pbs.new_attropl( len( attrs ) +
len( pbs_options ) )
(sorry it didn't come out very well due to line breaking)
The csv file contains a list of the multithreaded tools, each line
<tool id>\t<number of threads>\t<memory needed>\n
And it works fine, the jobs wait for their turn properly, but
information is duplicated. Perhaps there would be a way to include
something similar in galaxy's original code (if it is not already the
case, I may not be up-to-date) without duplicating data.
I hope that helps :)
I'm trying to run Mosaik on our galaxy instance on Ilumina paired reads. However when I selected "paired reads" and Ilumina as an input option, I can still only select one of the two fastq files as input. No 2nd file selector appears like with bwa, bowtie, etc...
Can anybody tell me what is going on - is this a known issue?
Dear Galaxy staff,
I have recently started using your tool and it has been really helpful,
When using Human Genome Variation, aaChanges, I would like to keep some
extra lines in the output file from either of the input files. In the tool
description it says I should be able to keep them:
"...chromosome, start, and end position as well as the SNP. The SNP can be
given using ambiguous-nucleotide symbols or a list of two to four alleles
separated by '/'. *Any other columns in the first input file will not be
used but will be kept for the output*. The second input file contains..."
However, I haven't found a way of actually have them in the output file.
What am I missing/doing incorrectly?
What I've been trying to keep by the way is rs IDs or Ensembl gene IDs.
Thank you in advance for your answer.
I am trying to run Cufflinks installation in Galaxy on Solexa RNAseq samples from HeLa cells.
Running Cuffcompare, according to the manual, should produce a tmap file, listing FMI values for detected isoforms. However, my files only have either "100" or "0" in FMI field. And FPKM column contains only zeros.
Is there something wrong with my input files, or parameter settings? Or is it rather a specific issue with Galaxy Cufflink's installation?
The data in question is available here:
I am trying to use bowtie to assign reads to the s. Cerevisiae genome. I
have data from paired end SOLiD sequencing with two unique six base pair
barcodes. Can I use bowtie to make csfasta and qual files from my mixed
original data split by bar code? I know I can use the trim option to remove
the barcode, but how do I specify one only?
I am trying to filter my fastq file with the condition of if quality score
of reads is less then min score.
So far, I have tried both *fastq_quality_filter* and *Filter FASTQ under NGS:
QC and manipulation** *but I was not be able to do it.
In the following you can see my fastq file.
@F4HZV5G02CX6WP rank=0000096 x=1092.0 y=1767.0 length=45
And these are quality scores.
[37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 39, 37,
37, 35, 35, 33, 35, 32, 29]
I want to filter bases if their quality scores are less than 33.
Any help would be greatly appreciated.