I am attempting to use Galaxy to calculate the mean sequence read
length and identify the range of read lengths for my 454 data. The
data has already been organized and sorted by species. The format of
the data is as follows:
etc...for each species
I have attempted to use the "Summary Statistics" button, however it
appears to only be for numerical data and not sequence data. Is this
User name: dac330
Dear galaxy users,
We have done deep sequencing on some known genomic loci using
Hiseq2000. I have already mapped the reads to the reference sequences by
using Galaxy. In the next step, I want to find SNPs and calculate the SNP
percentage within the reads. There are 500,000 to 1,000,000 reads per
biological sample. Can I do it with galaxy? If not, is there other programs
available in windows? Considering that I am not very familiar with
University of Florida
I have tested the new cuffdiff version in galaxy and was very eager to now also get the replicate data for each test. Have anyone tried that yet and succeeded with replicates in downstream R package CummeRbund? Even though I successfully build database with the now 15 output files (including read group tracking), no error message, I can not plot anything with replicates=T.
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: near "from": syntax error)
Anyone know the problem?
Johanna Sandgren, PhD
Department of Oncology-Pathology
CCK, Karolinska Institutet
SE-171 76 Stockholm, Sweden
+46-8-517 721 35 (office),
+46-8- 321047(fax), +46-708 388476 (mobile)
Hello, I have a set of coordinates for mm7 that I have been using try to
extract the genomic sequences. However it doesn't recognize the chromosome
name column. The are currently listed as chr1, chr2, ....chrX. This is
the error I get each time I try to extract sequences:
Chromosome by name 'chr1' was not found for build 'mm7'. Skipped 1181
invalid lines, 1st is #1, "chr1 4558068 4561910 region_0 0 +"
However if I change the build to mm10 it works fine - but the coordinates
are not the same between builds. Also, mm7 can't be lifted over to mm9 or
Does anyone know the proper format for chromosome name in mm7:
Has anyone run into this? I'm building a general-purpose filter control on
my galaxy tool xml template for enabling numeric fields to be filtered by >
< etc. parameters - in a user friendly way. I have a select list <para>
driven by a data table: ...
<param name="filter_column" type="select" label="Col"> <options
from_data_table="bccdc_blast_fields" /> </param>...This works fine in
building a list of fields to select from. Then I add <filter
type="sort_by" column="1"/> <filter type="add_value" name="TESTESTTEST"
but nothing happens, sort remains incorrect and no extra value. I try
these same filters on a previous <param> in form that is driven by <options
from_file="bccdc_blast_bins.loc"> and they work fine.
I also tried applying <fiilter type="static_value" ...> to no avail on
from_data_table options tag, but from from_file options, no problem.
So I start to think there's a bug whereby NO filters work on <options
from_data_table="bccdc_blast_fields" /> input? I've surveyed the
galaxy.tools.parameters.dynamic_optionspython code, but can't see a
decision point there in which from_data_table work vs from_file choice is
made; is it in another script file?
I'm using a BioLinux 2012 install that includes Galaxy.
Damion DooleyBC Centre for Disease ControlVancouver, BC
I have some data in tab delimited format and need to convert it to fasta
format. I have been using Galaxy text formatting tool for this and other
tools for a long time. It seems like the server is down and that tool is
Please can you check.
I am trying to perform metatranscriptomic analysis on gut samples. Whereas the host transcriptome was easy, the lack
of a reference genome for my mixed bacterial population is a problem.
Any suggestions how to do the alignment? Has anybody done metatranscriptomics on Galaxy before?
A recent paper from Leimena et al. (2013, BMC Genomics) claims they have done the alignment against the complete
NCBI prokaryotic genome database. This sounds like a good solution to me, but I struggle to get this running.
I am normally on the wet side of the work and my experience with programming and R is limited.
The aim is to compare the functional capacity of the bacterial communities in different closely related host species.
Thanks for help,
5130 Comstock Hall
14853 Ithaca, NY
You most likely want to explore tools that are designed specifically for
this purpose, if the reference genome you are talking about is the
assembled transcriptome. Trinity is one tool, but there are others in
the Tool Shed and on some of the Public Servers.
Your question is a bit confusing because the 'annotations' may already
be what these tools would produce and I am not sure what you are trying
to do next. If it is the assignment of putative function, then there are
many paths to follow, some better suited for viral genomes. You'll want
to find out what others doing this exact work are using right now and
consider the same tools. Start by checking out the public Galaxy
servers, many have trial tools that you can later include in a
local/cloud from the tool shed:
If your question was misunderstood (the reference genome is in fact a
DNA genome - and you have RNA sequence to align), then the RNA-seq
pipeline can be used as-is with 'Tophat for SOLiD', Cufflinks,
CuffMerge, CuffDiff - all on a local/cloud/slipstream with the reference
genome as a cluster reference genome. There is no requirement for
reference annotation with any of these tool - it helps to gain full
functionality - especially with CuffDiff, but is not required. More
assistance is at tophat.cufflinks(a)gmail.com.
Hopefully this helps,
On 10/24/13 6:06 PM, Oscar Aguilar wrote:
> Hi Dr. Jackson,
> I'm sorry to bother you but I have been searching for answers but I
> can't seem to find any and I'm sure that you would be able to answer
> my question.
> So I am trying to find a novel gene using de novo tramscriptome
> assembly and I see that TopHat might just be able to help me out with
> my dilemma. The viral genome not available on the galaxy website, and
> the other issue is that I am using SOLID data. So my question is, can
> I use TopHat with SOLID data by converting to nucleotide base fastq?
> or do I have to use TopHat2 with a colourspace viral genome? I also
> have to admit that I am completely new to bioinformatics and my
> project as lead me here so I am trying to tackle it on my own.
> Fo the custom genome, I have managed to load it (in fasta, and
> annotation in BED) but I am not sure how to assign the annotations to
> the genome. Also, does TopHat require an annotated genome? I read that
> it doesn't but I'm not sure...I fear that my gene is a spliced one and
> I would like to be able to pull it out from output data.
> I'm sorry to bother you as I'm sure the answer is out there I just
> really can't seem to find it and am now desperate.
> Thank you in advance,
> Oscar A. Aguilar, M.Sc
> PhD Candidate
> Sunnybrook Research Institute
> Department of Immunology
> University of Toronto
> oscar.aguilar(a)utoronto.ca <mailto:firstname.lastname@example.org>