March 2013 - galaxy-user - lists.galaxyproject.org

Depth command in SAM tools
by Els Willems 27 Mar '13

27 Mar '13

Dear all, I am a Phd student working on chicken genomics, with limited experience in the bio-informatics field. I performed an RNA-Seq experiment with single end 50 bp reads to find differential gene expressions between different groups. I have mapped this data with Tophat and used flagstat and Picard to check the number of mapped reads. To check the coverage of my genome, I could use the number of mapped reads and multiply this by the read length and divide by the genome size, but of course since I used mRNA as input material, average coverage will be low (only exons presents). I would like to use the Samtools Depth (as I read on SeqAnswers) to get the average coverage for a coveraged base AND the total base coverage, but this does not seem to be included in Galaxy. Does anyone know a way around this? Other useful tips and tricks are also welcomed. Thank you very much. Have a nice day. Yours Sincerely, Els --- Ir. Els Willems KU Leuven Department of Biosystems Division Livestock - Nutrition - Quality Laboratory of Livestock Physiology Kasteelpark Arenberg 30 bus 2456 B - 3001 Heverlee T (+32) 016 32 17 29 F (+32) 016 32 19 94

2 1

is there size limit of dataset for running Tophat?
by Du, Jianguang 27 Mar '13

27 Mar '13

Hi All, Is there a size limit of dataset for running Tophat at Galaxy? If there is, how many reads is the limit? Thanks. Jianguang

2 1

glitch in .wig visualization in trackster?
by Michael Axtell 27 Mar '13

27 Mar '13

Hi everyone. I'm having an issue with a wiggle file. I'm using Trackster on the public-main instance of Galaxy, with a custom genome build. My wiggle file fails to be shown. When added to the visualization using the 'add tracks' dialog, I see the usual hatched gray lines with the message "processing data, this may take some time". But then after a few minutes the track just goes to hatched gray lines with no messages, and the intensities are never displayed, nor is any error message (or any message at all, just stuck with the hatched gray lines). The .wig file has been extensively validated to conform to UCSC spec. In addition, the same file displays data just fine when loaded into Broad's IGV. So I'm confident it is formatted correctly. The custom genome is not a great one .. scaffolds not pseudomolecules, and there are many thousands of scaffolds in the assembly (scaffold N50 is 1.3M at scaffold 111 out of ~2,100 scaffolds; total length ~480M). If I slice my problematic wiggle file to only keep sub-sections of the data, sometimes it works. I tested a number of such sub-slices, and some worked and some didn't, as below (the numbers refer to scaffold numbers in my custom genome): 1-50 : worked 1-100 : worked 1-200 : worked 1-300 : failed 1-400 : failed 1-500 : failed 100-250 : worked 200-300 : worked 300-400 : worked 500-600 : worked >From the above, it seems possible the error is that Trackster just doesn't like wig files that exceed a certain number of chromosomes/scaffolds? Or some sort of data overload issue? Some other information: This custom genome build works fine on trackster to visualize several other datasets in gff, gff3, and bed format. In addition, the problem wiggle file is not so large .. the full file is only ~48M. It is a fixedStep file with span and step both equal to 100, and the data are relatively sparse. If anyone has a clue, let me know .. thanks! -- Michael J. Axtell, Ph.D. Associate Professor Dept. of Biology Penn State University http://axtell-lab-psu.weebly.com

3 3

Finding Indels
by Irene Kaplow 27 Mar '13

27 Mar '13

I would like to be able to replicate what Galaxy does to find indels on my own machine. However, I am facing the following challenges: 1. I want to find Indels in the 3-way multiz alignment of hg18, panTro2, and rheMac2, but I cannot find the alignment anywhere. Where can I get the alignment file? 2. I cannot find code for extracting the Indel's anywhere. What program did you use? Thanks so much! Irene

2 1

Question about tool
by Sandra Santos 27 Mar '13

27 Mar '13

Hi My name is Sandra and I'm a curator of a database of transcriptional relationships in yeast. We are doing our annual update, and in one paper I found a number of ChIP-seq results. Unfortunately, the authors only included in the supplemental information the genome coordinates, but no information regarding what the binding position corresponds to (promoter, ORF...). When I asked the authors for this information, they told me to do it myself. I'm actually quite busy and don't have time to waste analysing their results, but decided to check if GALAXY has a tool where I can use this list of positions as an input and get the annotation of the region. Thanks for your help -- Sandra C. dos Santos, PhD Post-Doctoral fellow https://fenix.ist.utl.pt:443/homepage/ist146260 Biological Sciences Research Group Instituto Superior Técnico Portugal tel: +351 218417233

2 1

History items remain pending (grey color)
by Priya Bhatt 26 Mar '13

26 Mar '13

Dear Galaxy Users, Forgive me in advance, but I am a VERY new Galaxy user! I am trying to go through the Galaxy 101 tutorial provided on the galaxy website (https://main.g2.bx.psu.edu/u/aun1/p/galaxy101) The first step asks to get exon data of Chromosome 22 from UCSC database, and the second step asks to get the SNP data from the same database. When I do these two steps, I understand the history items should turn green once they are processed, however after a few hours these items remain grey. Am I doing something wrong? Any help would be greatly appreciated! Thank you in advance, Priya

3 3

Could not find the refseq from database list
by xiaogeng Feng 26 Mar '13

26 Mar '13

Hi, I am using Galaxy to analyze Chlamydia trachomatis F/SW4 illumina sequence. We could not find this genome in your dataset list. Is there a way to import Chlamydia trachomatis F/SW4 sequence to Galaxy? how about the annotation file? Many thanks, Xiaogeng

2 1

Last day for early registration for GMOD 2013
by Scott Cain 25 Mar '13

25 Mar '13

Hi, With several guest speakers, including Jane Lomax from GO, Joseph Rossetto from the EBI, and Manuel Corpas from the Genome Analysis Centre, this GMOD meeting is shaping up to be a very intersting meeting indeed. Today (March 21) is the last day to register with the early registration pricing. To register, go to http://gmod2013.eventbrite.com/ I look forward to seeing you next month (April 5-6) in Cambridge, England. For more information about GMOD 2013, go to http://www.gmod.org/wiki/April_2013_GMOD_Meeting Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research

2 1

software post alignment
by Ianiri, Giuseppe 25 Mar '13

25 Mar '13

Hello, I performed my mapping using tophat - cufflink - cuffmerge - cuffdiff. With the information I have for my analysis so far, I can reannotate wrong genes, check for correct splicing etc. However, I would like to perform some analysis post-alignment, like for example samples clustering, volcano plot, heat maps etc. I guess I can't do this kind of analysis with Galaxy (I am using the free version on line on my Windows laptop) since I haven't found anything in the tools section. Does anyone know a software that I could use in windows with the data obtained in Galaxy? Any suggestion is really appreciated. Regards, Giuseppe Giuseppe Ianiri, Ph.D. Division of Cell Biology and Biophysics School of Biological Sciences 5100 Rockhill Road University of Missouri-Kansas City Kansas City, MO 64110 Email: ianirig(a)umkc.edu

2 1

Using BWA to map without any mismathces
by Daniel Sher 21 Mar '13

21 Mar '13

Hello, We have a sample containing several bacterial species and we want to uniquely map RNA-seq reads to the genomes of each of our organisms to get the expression patterns of each organism separately. We tried to use BWA in Galaxy with the “edit distance” (aln -n in the command line version) set to 0 but none of the reads were mapped (all had the SAM tag set to “4’). This is an artifact since running BLAST with some of the sequences showed that they have 100% identity to one of our genomes and not any others, so they should map uniquely. When running BWA with the number of mismatches set to between 1-5 >90% of our reads were mapped, and the number of mapped reads increased with the mismatch number so that seems to be working OK. Does the "aln -n" option really determine the number of mismatches? Any ideas why BWA will not run well in Galaxy using –n=0? Thanks Daniel --

2 2