July 2011 - galaxy-user - lists.galaxyproject.org

Chip-Seq, Encode Peaks and Galaxy
by Radhouane Aniba 07 Sep '11

07 Sep '11

Hi everyone, I have a list of genomic regions with some variants and would like to study the correlation between theses variants and epigenomics marks such as histone modifications. >From Encode download page, i got some files corresponding to peaks of these hsitone modifications and would like to know if there is a way to create a pipeline using galaxy to map my variants, depending on genomic regions to the information I have from the histone modification peaks. Is there someone who can point me to a step by step to do things to start using Galaxy ? Thank you Rad

4 4

wiggle file
by Richard Mark White 31 Aug '11

31 Aug '11

Hi, this should be simple but it is not..forgive the newbie question. i am doing chip-seq. bowtie>sam filter for mapped reads>MACS. i want to create a wiggle file that displays in ucsc, but when i choose the "WIG" option on macs, and then try to show it in UCSC, it treats each line of the created WIG file as a separate track, and obviously does not show it as a graph. is there a wiki page somewhere that can give me the basics? or can someone point me in the right direction? thanks. rich

2 1

Text Manipulation > Compute > c1[1:c1.find("(")] fails
by Robert Curtis Hendrickson 22 Aug '11

22 Aug '11

Folks, I have a column c1 that has entries like "GXP_297346(PVALB/human)". I'm trying to use Text Manipulation > Compute to strip off the "(...)" portion, leaving only the accession (which can vary in length). I have tried a variety of things that work in my python command line, but fail here, for example: c1[1:c1.find("(")] or c1.split('(')[0] This gets mangled: An error occurred running this job: Expression "c1__ob__1:c1.find("(")__cb__" likely invalid. Or An error occurred running this job: Expression "c1.split("(")__ob__0__cb__" likely invalid. Please help. This is driving me crazy. Searching the list, I find only http://gmod.827538.n3.nabble.com/inputs-sanitization-tt2664336.html#a2664911 "Inputs sanitization" which seems to indicate this is a global mapper that can only be disabled with dire security consequences. And http://gmod.827538.n3.nabble.com/substring-sequence-on-coordinate-in-column… "substring sequence on coordinate in columns" which doesn't ever answer the question about how to get compute to work. Thanks, Curtis

2 2

suggestion for multithreading
by Louise-Amélie Schmitt 09 Aug '11

09 Aug '11

Hello everyone, I'm using TORQUE with Galaxy, and we noticed that if a tool is multithreaded, the number of needed cores is not communicated to pbs, leading to job crashes if the required resources are not available when the job is submitted. Therefore I modified a little the code as follows in lib/galaxy/jobs/runners/pbs.py 256 # define PBS job options 257 attrs.append( dict( name = pbs.ATTR_N, value = str( "%s_%s_% s" % ( job_wrapper.job_id, job_wrapper.tool.id, job_wrapper.user ) ) ) ) 258 mt_file = open('tool-data/multithreading.csv', 'r') 259 for l in mt_file: 260 l = string.split(l) 261 if ( l[0] == job_wrapper.tool.id ): 262 attrs.append( dict( name = pbs.ATTR_l, resource = 'nodes', value = '1:ppn='+str(l[1]) ) ) 263 attrs.append( dict( name = pbs.ATTR_l, resource = 'mem', value = str(l[2]) ) ) 264 break 265 mt_file.close() 266 job_attrs = pbs.new_attropl( len( attrs ) + len( pbs_options ) ) (sorry it didn't come out very well due to line breaking) The csv file contains a list of the multithreaded tools, each line containing: <tool id>\t<number of threads>\t<memory needed>\n And it works fine, the jobs wait for their turn properly, but information is duplicated. Perhaps there would be a way to include something similar in galaxy's original code (if it is not already the case, I may not be up-to-date) without duplicating data. I hope that helps :) Best regards, L-A

7 15

Mosaik with Paired Reads
by John David Osborne 08 Aug '11

08 Aug '11

Hello, I'm trying to run Mosaik on our galaxy instance on Ilumina paired reads. However when I selected "paired reads" and Ilumina as an input option, I can still only select one of the two fastq files as input. No 2nd file selector appears like with bwa, bowtie, etc... Can anybody tell me what is going on - is this a known issue? -John

2 1

keeping aditional data in the aaChanges tool
by Ximena Bonilla 08 Aug '11

08 Aug '11

Dear Galaxy staff, I have recently started using your tool and it has been really helpful, thank you! When using Human Genome Variation, aaChanges, I would like to keep some extra lines in the output file from either of the input files. In the tool description it says I should be able to keep them: "...chromosome, start, and end position as well as the SNP. The SNP can be given using ambiguous-nucleotide symbols or a list of two to four alleles separated by '/'. *Any other columns in the first input file will not be used but will be kept for the output*. The second input file contains..." However, I haven't found a way of actually have them in the output file. What am I missing/doing incorrectly? What I've been trying to keep by the way is rs IDs or Ensembl gene IDs. Thank you in advance for your answer. Kind regards, Ximena

2 2

Cuffcompare .tmap file
by Aleks Schein 03 Aug '11

03 Aug '11

Dear all, I am trying to run Cufflinks installation in Galaxy on Solexa RNAseq samples from HeLa cells. Running Cuffcompare, according to the manual, should produce a tmap file, listing FMI values for detected isoforms. However, my files only have either "100" or "0" in FMI field. And FPKM column contains only zeros. Is there something wrong with my input files, or parameter settings? Or is it rather a specific issue with Galaxy Cufflink's installation? The data in question is available here: http://main.g2.bx.psu.edu/u/aleks/h/guided-assemblyadvanced Thanks, Aleks Schein

3 4

question about using bowtie
by William Light 02 Aug '11

02 Aug '11

I am trying to use bowtie to assign reads to the s. Cerevisiae genome. I have data from paired end SOLiD sequencing with two unique six base pair barcodes. Can I use bowtie to make csfasta and qual files from my mixed original data split by bar code? I know I can use the trim option to remove the barcode, but how do I specify one only?

2 2

calculating percent coverage over the target genome
by David Matthews 01 Aug '11

01 Aug '11

Hi Does anyone know how to calculate how much of a genome was covered by an alignment irrespective of the depth at each base? Cheers David

6 10

filtering fastq file according to qual score
by Haluk Dogan 01 Aug '11

01 Aug '11

Hi, I am trying to filter my fastq file with the condition of if quality score of reads is less then min score. So far, I have tried both *fastq_quality_filter* and *Filter FASTQ under NGS: QC and manipulation** *but I was not be able to do it. In the following you can see my fastq file. @F4HZV5G02CX6WP rank=0000096 x=1092.0 y=1767.0 length=45 TTGAGCAGCGGCGTCACGGCGGCGGCCTCGGCGGCCGCATAGGCG + FFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIHFFDDBDA> And these are quality scores. [37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 39, 37, 37, 35, 35, 33, 35, 32, 29] I want to filter bases if their quality scores are less than 33. Any help would be greatly appreciated. -- HD

2 3