For the last two days, nearly every job submitted to Galaxy at "main.g2.bx.psu.eud" has been successfully added to the queue but stuck in the waiting to run mode. A few exceptions have gotten through quickly, but if not, they stay in limbo land. Emptying the cache made no difference.
I started a workflow yesterday (user name: antonymerlinjose(a)gmail.com) and
it still hasn't gone past being queued to run. In fact, no jobs are
running. Please advice. Thank you.
Antony M Jose,
Dept. of Cell Biology & Molecular Genetics,
University of Maryland,
Rm 2116, Bioscience Research Building,
College Park, MD - 20742.
I have run a ChIPseq work flow in galaxy, At teh end I ran CEAS: Enrichment
on chromosome and annotation (version 1.0.0) to annotate the peaks
which gave me a pdf file shoiwng distribution of peaks across genome with
pie chart as well as well as histogram. It shows that ~5% of my peaks in
5UTR regions and other 3 % in 3' UTR 63 % exon and so on. Is there a way
that I can have list of genes/ refrence ids which arein 5'UTR /3'UTR. I
tried all tools in Galaxy but could not find it. There should be some way
to extract these summarized results in details. Any one has a suggestion
I am working with *Aedes aegypti * and I obtained around 500 million
reads (HiSeq2000, 50bp). After doing all analysis of differential gene
expression using known packages (Tophat, Cufflinks, Deseq etc) I was able
to find a set of gene of interest, besides some functional group of genes
that I already knew that I had to look at. Now, just looking over the 4,758
supercontigs and my data using IGV from Broad Institute (loading the genome
and the SAM files from Tophat), I find a lot of potential new genes
(hundreds or thousands of reads aligning to regions where there is no gene
annotation), I also find new exons for some genes or exons with different
sizes. I was thinking to do an *de novo* assembly to find new transcripts
and genes, but I was wondering if there is something else I could do. For
example, maybe I could just extract those regions where thousands of reads
align (new gene). I know that we can extract the sequence data for specific
transcript, is it possible to extract reads for regions without annotation,
only based in the number of reads aligned? Maybe I could pull all the data
together (from a couple sequencing lanes) and align it back to the genome,
and then proceed to gene annotation. Another problem is that I am not sure
how reliable would be the annotation only based on the data from HiSeq2000.
I would appreciate if anyone one have some idea or suggestion in how to
tackle this problem. Maybe *de novo* assembly is the way to go.
Texas A&M Entomology
Vector Biology Research Group
979 845 1885
Dear NGS users/developers,
please consider sending your work on NGS methods and applications to
the 5th Hitseq meeting. This year it is going to be in Long Beach, CA.
Please find the (last) call for abstracts below.
HiTSeq 2012: Conference on High Throughput Sequencing Methods and Applications
July 13-14, 2012 in Long Beach, CA, USA
Last Call for Abstracts (Deadline Extended)
* June 1st – Abstract submission deadline (EXTENDED)
* June 8th – Oral/Poster Presentation Decisions
* June 30th – Late breaking poster deadline
* July 13-14 - Conference
The Conference on High Throughput Sequencing Methods and Applications
(HiTSeq 2012) is a Satellite of the ISMB 2012 conference and brings
together biologists and computational scientists interested in
exploring the challenges and opportunities in the analysis of high-
throughput sequencing (HTS) technologies. HiTSeq 2012 welcomes
submissions on any topic related to high throughput sequencing
technologies. We are especially interested in presentations describing
methodology to infer various genetic variants (SNVs, small and larger
insertions/deletions, copy number variants), methods for analysis of
RNA sequencing data (RNA expression, de-novo transcriptome sequencing,
novel transcript discovery), and other applications of HTS
(transcription factor binding site discovery, methylation profiling,
cancer somatic aberration analysis, genome-wide disease association
studies by HTS, metagenomics). We are also interested in algorithms
for compressing and handling effectively large amounts of HTS data,
and the analysis of data from the emerging 3rd and 4th generation
* Dr. Chris Sander. Chair, Computational Biology Program, Memorial Sloan Kettering Cancer Center.
* Dr. Stan Nelson. UCLA Jonsson Comprehensive Cancer Center.
New: Special track on “Personal Genomes for Individualized Medicine”
There is currently a surge in the sequencing of “personal” genomes (or
exomes) with the intent of applying them in clinical decision-making.
>From the diagnosis of Mendelian and idiopathic diseases, the
identification of somatic mutations in cancer tumors to guide therapy
selection, to the prediction of susceptibility to complex disease to
enable prophylactic actions, the applications of personal genomes
herald an imminent change in how clinicians use genetic information in
individualized medicine. Nevertheless, an analysis bottleneck is
becoming apparent, and thus algorithms, methods, visualizations, and
efficient software to handle the onslaught of medical genomic
information are badly needed.
In this special track of HiTSeq 2012 we aim to showcase the methods
and tools that academic and industry researchers are developing in
this area, and to encourage a vigorous discussion of what is needed to
go forward. We are seeking paper and abstract submissions with an
emphasis on analysis methods for the case when the sample size is n=1
(i.e. the patient), how to make genome sequence analysis efficient &
clinical grade, as well as new techniques to summarize the wealth of
genomic information for clinical decision-making. Applications ranging
childhood diseases, cancer treatment, and complex disease
susceptibility are all welcome. This special track session will be
held on the second day of the SIG (July 14).
Simultaneously, HiTSeq also allows for submission of abstracts, which
will be evaluated independently for the meeting proceedings. The
abstracts should target topics of immediate relevance in the field. To
be considered for an oral presentation the material should not have
been previously published in any journal or proceedings. Late breaking
poster abstracts will be also accepted for exceptional research
results that became available after the other deadlines. Please check
the conference website for submission instructions.
Presentations at HiTSeq may be either plenary talks or a poster at the
meeting’s poster session. The final decisions whether each paper or
abstract is presented as a talk or a poster will be made by May 30.
Gunnar Rätsch, Sloan-Kettering Institute, USA
Francisco M. De La Vega, Stanford University, USA
Inanc Birol, British Columbia Cancer Agency, Canada
Sohrab Shah, British Columbia Cancer Agency, Canada
Questions? Contact e-mail: hitseq2012(a)hitseq.org
Dr. Gunnar Rätsch
Associate Professor and Lab Head
Computational Biology Center
Memorial Sloan-Kettering Cancer Center
415-417 E 68th street, Room Z-690
New York, NY 10065, USA
1275 York Avenue, Box 357
New York, NY 10065, USA
p: +1 646 888 2802
f: +1 646 888 3105
Please note that this e-mail and any files transmitted from
Memorial Sloan-Kettering Cancer Center may be privileged, confidential,
and protected from disclosure under applicable law. If the reader of
this message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient,
you are hereby notified that any reading, dissemination, distribution,
copying, or other use of this communication or any of its attachments
is strictly prohibited. If you have received this communication in
error, please notify the sender immediately by replying to this message
and deleting this message, any attachments, and all copies and backups
from your computer.
Dear galaxy users,
I am trying to map some multiplexing bisulfite PCR data (Illumina) to our 20+ genes of interest. I want to use the "Map with Bowtie for Illumina" in Galaxy. Therefore, I need to change the reference genome to my known DNA sequences. However, I don't know how to make such a reference index. Shall it be in FASTQ, FASTA or CTF format? My reference sequences are now in a word file. How can I convert their format to the desired one?
My second question is that my sequencing data for an individual sample are separated in 8 different FASTQ files. Does it matter that I map them individually and then merge them together? Or shall I combine them first (which will be a very huge file) and then do the mapping? Does it change the results either way?
My last question is that since we are looking at the CG methylation, there certainly will be mismatches in the CG sites (such as being a CG or TG) compared to the reference sequence. I am afraid the mismatches may be greater than 3 for a read about 100 bp. Do you think Bowtie will allow these many mismatches? If so, can you suggest a better way to do the mapping?
Thanks for your attention!!!
The 2012 Galaxy Community Conference (GCC2012), being held in Chicago,
Illinois, July 25-27, is *now just 10 weeks away*. We have updates on:
1. Help set the topics covered on Training Day (by this Friday)
2. Early registration closes June 11
3. Confirmed Speaker List
As always, please let me know if you have any questions, and I hope to see
you in Chicago!
on behalf of the GCC2012 Organizing Committee
*1. Training Day Topics: Vote by this Friday
A day of tutorials has been added to the agenda this year. The GCC2012
Training Day has 3 parallel tracks, each featuring four, 90 minute
workshops and covering between and 7 and 12 different topics. *Please take
a few minutes to vote on topics that you would like to see presented:
The survey ends this Friday, May 18, so please provide your feedback now.
*2. Early Registration: Ends June 11
The GCC2012 early registration is now just 4 weeks away. Registering early
saves* 36 to 42%* on registration costs, and allows you to book discounted
conference lodging before it fills up. *Register now at*
*3. Confirmed Speakers and Abstracts Posted
A list of confirmed speakers and abstracts are now available on the
conference web site at
This list is not yet finished, but will give you a pretty accurate idea of
the range of topics that will be discussed during the main meeting.