I am a Phd student working on chicken genomics, with limited experience in the bio-informatics field. I performed an RNA-Seq experiment with single end 50 bp reads to find differential gene expressions between different groups. I have mapped this data with Tophat and used flagstat and Picard to check the number of mapped reads.
To check the coverage of my genome, I could use the number of mapped reads and multiply this by the read length and divide by the genome size, but of course since I used mRNA as input material, average coverage will be low (only exons presents). I would like to use the Samtools Depth (as I read on SeqAnswers) to get the average coverage for a coveraged base AND the total base coverage, but this does not seem to be included in Galaxy. Does anyone know a way around this? Other useful tips and tricks are also welcomed. Thank you very much.
Have a nice day.
Yours Sincerely, Els
Ir. Els Willems
Department of Biosystems
Division Livestock - Nutrition - Quality
Laboratory of Livestock Physiology
Kasteelpark Arenberg 30 bus 2456
B - 3001 Heverlee
T (+32) 016 32 17 29
F (+32) 016 32 19 94
I'm having an issue with a wiggle file. I'm using Trackster on the
public-main instance of Galaxy, with a custom genome build. My wiggle
file fails to be shown. When added to the visualization using the 'add
tracks' dialog, I see the usual hatched gray lines with the message
"processing data, this may take some time". But then after a few
minutes the track just goes to hatched gray lines with no messages,
and the intensities are never displayed, nor is any error message (or
any message at all, just stuck with the hatched gray lines).
The .wig file has been extensively validated to conform to UCSC spec.
In addition, the same file displays data just fine when loaded into
Broad's IGV. So I'm confident it is formatted correctly.
The custom genome is not a great one .. scaffolds not pseudomolecules,
and there are many thousands of scaffolds in the assembly (scaffold
N50 is 1.3M at scaffold 111 out of ~2,100 scaffolds; total length
~480M). If I slice my problematic wiggle file to only keep
sub-sections of the data, sometimes it works. I tested a number of
such sub-slices, and some worked and some didn't, as below (the
numbers refer to scaffold numbers in my custom genome):
1-50 : worked
1-100 : worked
1-200 : worked
1-300 : failed
1-400 : failed
1-500 : failed
100-250 : worked
200-300 : worked
300-400 : worked
500-600 : worked
>From the above, it seems possible the error is that Trackster just
doesn't like wig files that exceed a certain number of
chromosomes/scaffolds? Or some sort of data overload issue?
Some other information: This custom genome build works fine on
trackster to visualize several other datasets in gff, gff3, and bed
format. In addition, the problem wiggle file is not so large .. the
full file is only ~48M. It is a fixedStep file with span and step both
equal to 100, and the data are relatively sparse.
If anyone has a clue, let me know .. thanks!
Michael J. Axtell, Ph.D.
Dept. of Biology
Penn State University
I would like to be able to replicate what Galaxy does to find indels on
my own machine. However, I am facing the following challenges:
1. I want to find Indels in the 3-way multiz alignment of hg18,
panTro2, and rheMac2, but I cannot find the alignment anywhere. Where
can I get the alignment file?
2. I cannot find code for extracting the Indel's anywhere. What
program did you use?
Thanks so much!
My name is Sandra and I'm a curator of a database of transcriptional relationships in yeast. We are doing our annual update, and in one paper I found a number of ChIP-seq results. Unfortunately, the authors only included in the supplemental information the genome coordinates, but no information regarding what the binding position corresponds to (promoter, ORF...). When I asked the authors for this information, they told me to do it myself. I'm actually quite busy and don't have time to waste analysing their results, but decided to check if GALAXY has a tool where I can use this list of positions as an input and get the annotation of the region.
Thanks for your help
Sandra C. dos Santos, PhD
Biological Sciences Research Group
Instituto Superior Técnico
tel: +351 218417233
Dear Galaxy Users,
Forgive me in advance, but I am a VERY new Galaxy user!
I am trying to go through the Galaxy 101 tutorial provided on the galaxy website (https://main.g2.bx.psu.edu/u/aun1/p/galaxy101). The first step asks to get exon data of Chromosome 22 from UCSC database, and the second step asks to get the SNP data from the same database. When I do these two steps, I understand the history items should turn green once they are processed, however after a few hours these items remain grey. Am I doing something wrong?
Any help would be greatly appreciated!
Thank you in advance,
Hi, I am using Galaxy to analyze Chlamydia trachomatis F/SW4 illumina sequence. We could not find this genome in your dataset list. Is there a way to import Chlamydia trachomatis F/SW4 sequence to Galaxy? how about the annotation file? Many thanks, Xiaogeng
With several guest speakers, including Jane Lomax from GO, Joseph
Rossetto from the EBI, and Manuel Corpas from the Genome Analysis
Centre, this GMOD meeting is shaping up to be a very intersting
meeting indeed. Today (March 21) is the last day to register with the
early registration pricing. To register, go to
I look forward to seeing you next month (April 5-6) in Cambridge,
England. For more information about GMOD 2013, go to
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
I performed my mapping using tophat - cufflink - cuffmerge - cuffdiff.
With the information I have for my analysis so far, I can reannotate wrong
genes, check for correct splicing etc. However, I would like to perform
some analysis post-alignment, like for example samples clustering, volcano
plot, heat maps etc. I guess I can't do this kind of analysis with Galaxy
(I am using the free version on line on my Windows laptop) since I haven't
found anything in the tools section. Does anyone know a software that I
could use in windows with the data obtained in Galaxy?
Any suggestion is really appreciated.
Giuseppe Ianiri, Ph.D.
Division of Cell Biology and Biophysics
School of Biological Sciences
5100 Rockhill Road
University of Missouri-Kansas City
Kansas City, MO 64110
We have a sample containing several bacterial species and we want to uniquely map RNA-seq reads to the genomes of each of our organisms to get the expression patterns of each organism separately. We tried to use BWA in Galaxy with the “edit distance” (aln -n in the command line version) set to 0 but none of the reads were mapped (all had the SAM tag set to “4’). This is an artifact since running BLAST with some of the sequences showed that they have 100% identity to one of our genomes and not any others, so they should map uniquely.
When running BWA with the number of mismatches set to between 1-5 >90% of our reads were mapped, and the number of mapped reads increased with the mismatch number so that seems to be working OK.
Does the "aln -n" option really determine the number of mismatches? Any ideas why BWA will not run well in Galaxy using –n=0?