To whom it may concern
I would like to kindly ask you if you do have any experience in de-novo
transcriptomic analysis (no reference genome available) who might give
us some advice.
Our main question is how to create the best set of cDNA contigs, on
which we can map our RNAseq reads for the analysis of differential
expression. Currently 4 larger sets of of RNAseq reads are available
from different genotypes as well as draft genome assembly for one of the
genotypes. We worry about the SNPs in different genotypes affecting the
assembly, if we combine all the RNAseq datasets and using assemblers
such as Trinity, Oases, Velvet. Might it be better to use the draft
genomic assembly to obtain cDNA contigs using Tophat/cufflinks via all
available RNAseq data or only using the RNAseq data from the same
genotype as the genome draft?
Thank you in advance
Hi Seung Hee,
You can request that this tool be added to the public Main server at
usegalaxy.org through Trello and the team will consider it. For right
now, the options are local or cloud. (as in my other reply)
Or, you can look around the the other public servers hosted by our
community - each is run by a distinct group with their own
It may be simplest to see if a local will do the job, then upload the
results to the public server for downstream analysis. Just do the very
basics of a "production server" install and then add the tool to test it
out. This will take some line commands to set up, but shouldn't be too
much of an investment. The links are:
Local install help/discussion: galaxy-dev(a)bx.psu.edu
Subscribe or search prior Q/A: http://wiki.galaxyproject.org/MailingLists
On 11/25/13 11:29 AM, Seung Hee Cho wrote:
> Thank you for much for your great help!
> I am trying to use this tool but I am wondering if I can use this
> CutAdapt tools on the public server. I was working on my job on the
> public server, so if not I need download it for use.
> I truly appreciate your help!
> *Seung Hee Cho*
> Contreras Research Group, CPE 5.416
> The University of Texas at Austin
> Department of Chemical Engineering
> 200 E Dean Keeton St. Stop C0400
> Austin, TX 78712-1589
> On Mon, Nov 25, 2013 at 10:08 AM, Jennifer Jackson <jen(a)bx.psu.edu
> <mailto:email@example.com>> wrote:
> Thanks Peter for another option!
> Galaxy team
> On 11/23/13 6:19 AM, Peter Cock wrote:
> On Fri, Nov 22, 2013 at 8:48 PM, Jennifer Jackson
> <jen(a)bx.psu.edu <mailto:firstname.lastname@example.org>> wrote:
> Hi Seung Hee,
> I know we discussed this on the other list, but I didn't
> point you to the
> open development ticket to (potentially) extend the
> functions of the "Cut"
> tool. This is not being actively worked on right now, but
> you can follow it
> for updates if you want.
> Others are still welcome to comment about what types of
> solutions they might
> have to offer. There is no specific tool to do this on
> Main right now (or in
> the Tool Shed, from my checks). http://usegalaxy.org/toolshed
> This tool of mine might do what Seung Hee wanted,
> but I have not tried it on very large Illumina datasets:
> Jennifer Hillman-Jackson
I am very new to Galaxy. We have performed a comparative analysis between the transcriptomes of different samples. We performed the analysis using Galaxy software (Tophat; CuffDiff; etc). What my PI has done is compiled a list of all the genes differentially expressed between each set, each in a separate excel sheet. So what I have is an excel spreadsheet with a list (usually around 300) of test id, gene id, and locus (ChrX:111111111-22222222222). Initially, we have been identifying each gene individually, one at a time, by pasting the locus into the UCSC browser. This works, but is incredibly tedious. There has to be a better way in Galaxy. I have tried making BED files out of the loci, but so far I have been unable to identify genes using galaxy.
Can someone please explain how I can take my long list of loci and get gene names, ID, function, and possibly some downstream comparative ontologies to begin analyzing.
Like I said, very new to Galaxy and genomics.
Thanks very much
Hi, I am a galazy user and I want to trim exact sequences (not the
location) from 5' end. Is there any tool I can use for this?
>From this sequence, I want to remove *AATGATACGGCGACCACCG,*
*so I can get "**AACACTGCGTTTGCTGGCTTTG*ATG" only.
If I use trim sequences or FASTX trimmer, then it will be trimmed absolute
It would be great help. Thank you so much!
*Seung Hee *
Cuffdiff in Galaxy produces "read_group_tracking" files. However, I wonder if these files are enough when using 'replicates=T' in cummeRbund or I also need "read_groups.info" and "run.info" files in myDir.
I've tried 'replicates(cuff)' with the 'read_group_tracking' files in my directory and I get back an empty set, means that cummerbund is not using replicates (or I'm doing something wrong!). At this time I don't see other option that run cuffdiff locally. At least you have any suggestion or advice about how I can get 'info.files' from Galaxy.
Any particular reason why Cuffdiff output files in Galaxy are not ALL the files described in the manual tool?
Thanks for you help.
I have been trying to analyze a rat Solid SRA but I encountered a problem: cufflinks gave me 0 RPKM in all genes. Here is my workflow
1. Get data with EBI SRA: sent the fastaq file directly to galaxy
2. Fastaq groomer
3. Mapped with bowtie for Solid (paire-ended) with the built- in index rat rn5 as reference genome
4. Sam to Bam the bowtie mapping result
5. Cufflinks the bam file
All RPKMs of gene expression and transcript expression have a 0 value even thought the RPKM status is OK. I used default setting for all jobs. Am I missing something? Any help, suggestion will be greatly appreciated. Thank you very much
I have submitted chemicals to the QuantMap. The chemicals are successfully prepared by the "QuantMap Prep" and run by "QuantMap server".
However, in the output files (dendrogram as well as the cluster information), all of the chemical names are truncated.
Thus, some chemicals with identical truncated names cannot be differentiated from each other.
I wonder if CIDs or full chemical names can be reported as well.
I am looking for the right way to do a computation using "text
manipulation, compute an expression on every row".
I have a table consisting of 20 columns and about 15.000 rows.
Column 1 is my untreated or control and I would like to normalize every
other column to this control by simple division.
As a result I would like to obtain a table in which column 1 is set to a
value of 1 for every row, and the corresponding values in the other
columns resulting from column x / column 1.
I tried something like c2/c1,c3/c1,c4/c1... on a small testfile but that
results in all normalized values being put into one column in brackets
rather than being put into individual columns. What would be the right
I also thought about creating a workflow consisting of successive
divisions of the columns but that feels rather complicated. Is there an
Any help would be much appreciated!
Tobias Hohenauer, PhD
GCNA, Disease Mechanism Research Core
RIKEN Brain Science Institute
2-1 Hirosawa, Wako-shi
I have tested the new cuffdiff version in galaxy and was very eager to now also get the replicate data for each test. Have anyone tried that yet and succeeded with replicates in downstream R package CummeRbund? Even though I successfully build database with the now 15 output files (including read group tracking), no error message, I can not plot anything with replicates=T.
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: near "from": syntax error)
Anyone know the problem?
Johanna Sandgren, PhD
Department of Oncology-Pathology
CCK, Karolinska Institutet
SE-171 76 Stockholm, Sweden
+46-8-517 721 35 (office),
+46-8- 321047(fax), +46-708 388476 (mobile)
Mean inner distance can be thought of as being the estimated gap left
between the two ends of the paired reads. This is from the manual (also
at the bottom of the Tophat tool form):
> -r This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments
> selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter
> is required for paired end runs.
In short, take the insert size, subtract the length of both sequenced
ends, and that is the "Mean inner distance".
On 11/18/13 10:08 AM, Vanessa Lattimore wrote:
> Hi Jen,
> I was hoping you would be able to clarify something for me. What is the mean inner distance between mate pairs in the TopHat tool? I can't find a consensus online on what it is and everything I have found seems to be guesses.
> My data was run on the HiSeq, and the average insert size was ~200 prior to all adaptors being removed. Do I go off this to determine the inner distance or is there a pipeline of tools I need to use to accurately determine it?
> Your help would be greatly appreciated!
> Many regards,