May 2012 - galaxy-user - lists.galaxyproject.org

"Job is waiting to run" forever
by Lee Silver 21 May '12

21 May '12

For the last two days, nearly every job submitted to Galaxy at "main.g2.bx.psu.eud" has been successfully added to the queue but stuck in the waiting to run mode. A few exceptions have gotten through quickly, but if not, they stay in limbo land. Emptying the cache made no difference. Any advice?

2 1

long wait on jobs
by Antony Jose 21 May '12

21 May '12

Hi, I started a workflow yesterday (user name: antonymerlinjose(a)gmail.com) and it still hasn't gone past being queued to run. In fact, no jobs are running. Please advice. Thank you. Antony -- Antony M Jose, Dept. of Cell Biology & Molecular Genetics, University of Maryland, Rm 2116, Bioscience Research Building, College Park, MD - 20742.

2 1

cuffmerge loses p_id
by Christopher M. Weber 21 May '12

21 May '12

Hello, Problem: Cuffmerge loses p_id from reference genome in merged gtf file on Galaxy online server resulting in blank cds cuffdiff files. DATA Input to Cuffmerge or Cuffcompare: Two fly cufflinks transcript assemblies created from bam files in Galaxy server using reference annotation and bias correction. Options: Use Reference Annotation: YES UCSC DM3 genes gtf (D. melanogaster) or ENSMBL 5.25 genes gtf Use Sequence Data: YES Result: tss_id found in cuffmerge but no p_id with either reference annotation file. Examples included below. Reference: 2L protein_coding stop_codon 8608 8610 . + 0 exon_number "2"; gene_id "FBgn0031208"; gene_name "CG11023"; p_id "P13746"; transcript_id "FBtr0300689"; transcript_name "CG11023-RB"; tss_id "TSS8369"; Cuffmerge: Cufflinks exon 8193 9484 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; gene_name "CG11023"; oId "FBtr0300689"; nearest_ref "FBtr0300689"; class_code "="; tss_id "TSS1"; 2L Cufflinks exon 66721 67003 . + . gene_id "XLOC_000002"; transcript_id "TCONS_00000003"; exon_number "1"; gene_name "dbr"; oId "CUFF.1.1"; nearest_ref "FBtr0078100"; class_code "j"; tss_id "TSS2"; Help is much appreciated, Thanks!

2 2

Question about extracting information from CEAS run results
by shamsher jagat 21 May '12

21 May '12

I have run a ChIPseq work flow in galaxy, At teh end I ran CEAS: Enrichment on chromosome and annotation (version 1.0.0) to annotate the peaks which gave me a pdf file shoiwng distribution of peaks across genome with pie chart as well as well as histogram. It shows that ~5% of my peaks in 5UTR regions and other 3 % in 3' UTR 63 % exon and so on. Is there a way that I can have list of genes/ refrence ids which arein 5'UTR /3'UTR. I tried all tools in Galaxy but could not find it. There should be some way to extract these summarized results in details. Any one has a suggestion please? Thanks Kanwar

2 1

Extract data and new genes
by Luciano Cosme 18 May '12

18 May '12

Hi Everyone, I am working with *Aedes aegypti * and I obtained around 500 million reads (HiSeq2000, 50bp). After doing all analysis of differential gene expression using known packages (Tophat, Cufflinks, Deseq etc) I was able to find a set of gene of interest, besides some functional group of genes that I already knew that I had to look at. Now, just looking over the 4,758 supercontigs and my data using IGV from Broad Institute (loading the genome and the SAM files from Tophat), I find a lot of potential new genes (hundreds or thousands of reads aligning to regions where there is no gene annotation), I also find new exons for some genes or exons with different sizes. I was thinking to do an *de novo* assembly to find new transcripts and genes, but I was wondering if there is something else I could do. For example, maybe I could just extract those regions where thousands of reads align (new gene). I know that we can extract the sequence data for specific transcript, is it possible to extract reads for regions without annotation, only based in the number of reads aligned? Maybe I could pull all the data together (from a couple sequencing lanes) and align it back to the genome, and then proceed to gene annotation. Another problem is that I am not sure how reliable would be the annotation only based on the data from HiSeq2000. I would appreciate if anyone one have some idea or suggestion in how to tackle this problem. Maybe *de novo* assembly is the way to go. Thank you. Luciano -- *Luciano Cosme* --------------------------------------------- PhD Candidate Texas A&M Entomology Vector Biology Research Group www.lcosme.com 979 845 1885 cosme(a)tamu.edu ---------------------------------------------

2 2

LAST CALL FOR ABSTRACTS: High Throughput Sequencing Methods and Applications
by ratschg＠mskcc.org 16 May '12

16 May '12

Dear NGS users/developers, please consider sending your work on NGS methods and applications to the 5th Hitseq meeting. This year it is going to be in Long Beach, CA. Please find the (last) call for abstracts below. Cheers, Gunnar HiTSeq 2012: Conference on High Throughput Sequencing Methods and Applications http://www.hitseq.org July 13-14, 2012 in Long Beach, CA, USA Last Call for Abstracts (Deadline Extended) Key Dates: * June 1st – Abstract submission deadline (EXTENDED) * June 8th – Oral/Poster Presentation Decisions * June 30th – Late breaking poster deadline * July 13-14 - Conference Overview: The Conference on High Throughput Sequencing Methods and Applications (HiTSeq 2012) is a Satellite of the ISMB 2012 conference and brings together biologists and computational scientists interested in exploring the challenges and opportunities in the analysis of high- throughput sequencing (HTS) technologies. HiTSeq 2012 welcomes submissions on any topic related to high throughput sequencing technologies. We are especially interested in presentations describing methodology to infer various genetic variants (SNVs, small and larger insertions/deletions, copy number variants), methods for analysis of RNA sequencing data (RNA expression, de-novo transcriptome sequencing, novel transcript discovery), and other applications of HTS (transcription factor binding site discovery, methylation profiling, cancer somatic aberration analysis, genome-wide disease association studies by HTS, metagenomics). We are also interested in algorithms for compressing and handling effectively large amounts of HTS data, and the analysis of data from the emerging 3rd and 4th generation sequencing platforms. Keynotes: * Dr. Chris Sander. Chair, Computational Biology Program, Memorial Sloan Kettering Cancer Center. * Dr. Stan Nelson. UCLA Jonsson Comprehensive Cancer Center. New: Special track on “Personal Genomes for Individualized Medicine” There is currently a surge in the sequencing of “personal” genomes (or exomes) with the intent of applying them in clinical decision-making. >From the diagnosis of Mendelian and idiopathic diseases, the identification of somatic mutations in cancer tumors to guide therapy selection, to the prediction of susceptibility to complex disease to enable prophylactic actions, the applications of personal genomes herald an imminent change in how clinicians use genetic information in individualized medicine. Nevertheless, an analysis bottleneck is becoming apparent, and thus algorithms, methods, visualizations, and efficient software to handle the onslaught of medical genomic information are badly needed. In this special track of HiTSeq 2012 we aim to showcase the methods and tools that academic and industry researchers are developing in this area, and to encourage a vigorous discussion of what is needed to go forward. We are seeking paper and abstract submissions with an emphasis on analysis methods for the case when the sample size is n=1 (i.e. the patient), how to make genome sequence analysis efficient & clinical grade, as well as new techniques to summarize the wealth of genomic information for clinical decision-making. Applications ranging childhood diseases, cancer treatment, and complex disease susceptibility are all welcome. This special track session will be held on the second day of the SIG (July 14). Abstracts: Simultaneously, HiTSeq also allows for submission of abstracts, which will be evaluated independently for the meeting proceedings. The abstracts should target topics of immediate relevance in the field. To be considered for an oral presentation the material should not have been previously published in any journal or proceedings. Late breaking poster abstracts will be also accepted for exceptional research results that became available after the other deadlines. Please check the conference website for submission instructions. Oral/Poster presentations: Presentations at HiTSeq may be either plenary talks or a poster at the meeting’s poster session. The final decisions whether each paper or abstract is presented as a talk or a poster will be made by May 30. Organizers: Gunnar Rätsch, Sloan-Kettering Institute, USA Francisco M. De La Vega, Stanford University, USA Inanc Birol, British Columbia Cancer Agency, Canada Sohrab Shah, British Columbia Cancer Agency, Canada Questions? Contact e-mail: hitseq2012(a)hitseq.org -- Dr. Gunnar Rätsch Associate Professor and Lab Head Computational Biology Center Memorial Sloan-Kettering Cancer Center Office address: 415-417 E 68th street, Room Z-690 New York, NY 10065, USA Postal address: 1275 York Avenue, Box 357 New York, NY 10065, USA http://ratschlab.org ratschg(a)mskcc.org p: +1 646 888 2802 f: +1 646 888 3105 ===================================================================== Please note that this e-mail and any files transmitted from Memorial Sloan-Kettering Cancer Center may be privileged, confidential, and protected from disclosure under applicable law. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, or other use of this communication or any of its attachments is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and deleting this message, any attachments, and all copies and backups from your computer.

1 0

Bowtie reference genome
by Fang, Xiefan 15 May '12

15 May '12

Dear galaxy users, I am trying to map some multiplexing bisulfite PCR data (Illumina) to our 20+ genes of interest. I want to use the "Map with Bowtie for Illumina" in Galaxy. Therefore, I need to change the reference genome to my known DNA sequences. However, I don't know how to make such a reference index. Shall it be in FASTQ, FASTA or CTF format? My reference sequences are now in a word file. How can I convert their format to the desired one? My second question is that my sequencing data for an individual sample are separated in 8 different FASTQ files. Does it matter that I map them individually and then merge them together? Or shall I combine them first (which will be a very huge file) and then do the mapping? Does it change the results either way? My last question is that since we are looking at the CG methylation, there certainly will be mismatches in the CG sites (such as being a CG or TG) compared to the reference sequence. I am afraid the mismatches may be greater than 3 for a read about 100 bp. Do you think Bowtie will allow these many mismatches? If so, can you suggest a better way to do the mapping? Thanks for your attention!!! Xiefan

2 2

FASTq sanger to Illumina FASTq
by shamsher jagat 15 May '12

15 May '12

I want to convert Sanger FASTq to Illumina FASTq with a understanding that Sanger is the current option with CASAVA. Is it possible to do such conversion in Galaxy? Thanks

2 1

GCC2012 Update
by Dave Clements 14 May '12

14 May '12

Hello all, The 2012 Galaxy Community Conference (GCC2012), being held in Chicago, Illinois, July 25-27, is *now just 10 weeks away*. We have updates on: 1. Help set the topics covered on Training Day (by this Friday) 2. Early registration closes June 11 3. Confirmed Speaker List As always, please let me know if you have any questions, and I hope to see you in Chicago! Thanks, Dave Clements on behalf of the GCC2012 Organizing Committee *1. Training Day Topics: Vote by this Friday * A day of tutorials has been added to the agenda this year. The GCC2012 Training Day has 3 parallel tracks, each featuring four, 90 minute workshops and covering between and 7 and 12 different topics. *Please take a few minutes to vote on topics that you would like to see presented: * http://bit.ly/GCC2012TDSurvey The survey ends this Friday, May 18, so please provide your feedback now. *2. Early Registration: Ends June 11 * The GCC2012 early registration is now just 4 weeks away. Registering early saves* 36 to 42%* on registration costs, and allows you to book discounted conference lodging before it fills up. *Register now at* http://galaxy.psu.edu/gcc2011/Register.html *3. Confirmed Speakers and Abstracts Posted * A list of confirmed speakers and abstracts are now available on the conference web site at http://wiki.g2.bx.psu.edu/Events/GCC2012/Program#Confirmed_Speakers This list is not yet finished, but will give you a pretty accurate idea of the range of topics that will be discussed during the main meeting. -- http://galaxyproject.org/GCC2012 <http://galaxyproject.org/wiki/GCC2012> http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ http://galaxyproject.org/wiki/

1 1

May 11, 2012 Galaxy Development News Brief
by Jennifer Jackson 11 May '12

11 May '12

Dear Galaxy Community, The May 11, 2012 Galaxy Distribution has been released at: http://bitbucket.org/galaxy/galaxy-dist/ Complete News Brief <http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_05_11> *Highlights:* * *EMBOSS* tools and datatypes have now moved from the _/Galaxy distribution/_ <http://bitbucket.org/galaxy/galaxy-dist/> to the /Galaxy Main Tool Shed/ <http://toolshed.g2.bx.psu.edu>. * Tool Integration Tests <http://wiki.g2.bx.psu.edu/Tool%20Shed#Using_Galaxy.27s_functional_test_fram…>, Custom Tool Panel Configuration <http://wiki.g2.bx.psu.edu/Tool%20Shed#Managing_the_layout_of_your_Galaxy_to…>, and Configurable Tool Output Locations <http://wiki.g2.bx.psu.edu/Admin/Tools/Multiple%20Output%20Files>. * Improved Multiprocess Job Handling <http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Web%20Application%20Scal…> and Enhanced OpenID <http://openid.net> Support. * GATK <http://www.broadinstitute.org/gsa/wiki> version 1.4, FreeBayes <http://github.com/ekg/freebayes>, Updated *Megablast* using NCBI BLAST+ <http://blast.ncbi.nlm.nih.gov/Blast.cgi>, Trinity <http://trinityrnaseq.sourceforge.net>, WormBase 2 <http://www.wormbase.org>, and IGB <http://bioviz.org/igb/index.html> external display. * Trackster <http://wiki.g2.bx.psu.edu/Learn/Visualization> upgrades include /strand coloring/, /interval datatype support/, and /tabix indexing/ (fast!!). * Updated *User Interface*, *Workflow-API* upgrades, and *Custom UCSC Display*. *http://getgalaxy.org* *new*: % hg clone http://www.bx.psu.edu/hg/galaxy galaxy-dist *upgrade*: % hg pull -u -r 17d57db9a7c0 /Thanks for using Galaxy!/ Jennifer Jackson Galaxy Team <http://wiki.g2.bx.psu.edu/Galaxy%20Team> http://galaxyproject.org

1 0