November 2013 - galaxy-user - lists.galaxyproject.org

Transcriptome Hypericum perforatum
by miroslav.sotak 26 Nov '13

26 Nov '13

To whom it may concern I would like to kindly ask you if you do have any experience in de-novo transcriptomic analysis (no reference genome available) who might give us some advice. Our main question is how to create the best set of cDNA contigs, on which we can map our RNAseq reads for the analysis of differential expression. Currently 4 larger sets of of RNAseq reads are available from different genotypes as well as draft genome assembly for one of the genotypes. We worry about the SNPs in different genotypes affecting the assembly, if we combine all the RNAseq datasets and using assemblers such as Trinity, Oases, Velvet. Might it be better to use the draft genomic assembly to obtain cDNA contigs using Tophat/cufflinks via all available RNAseq data or only using the RNAseq data from the same genotype as the genome draft? Thank you in advance Best wishes Miro Sotak

3 2

Re: [galaxy-user] help for trim sequences
by Jennifer Jackson 26 Nov '13

26 Nov '13

Hi Seung Hee, You can request that this tool be added to the public Main server at usegalaxy.org through Trello and the team will consider it. For right now, the options are local or cloud. (as in my other reply) Or, you can look around the the other public servers hosted by our community - each is run by a distinct group with their own contact/help/public-use criteria: http://wiki.galaxyproject.org/PublicGalaxyServers It may be simplest to see if a local will do the job, then upload the results to the public server for downstream analysis. Just do the very basics of a "production server" install and then add the tool to test it out. This will take some line commands to set up, but shouldn't be too much of an investment. The links are: http://getgalaxy.org http://usegalaxy.org/toolshed http://wiki.galaxyproject.org/Tool%20Shed#Installing.2C_maintaining_and_uni… Local install help/discussion: galaxy-dev(a)bx.psu.edu Subscribe or search prior Q/A: http://wiki.galaxyproject.org/MailingLists Take care, Jen Galaxy team On 11/25/13 11:29 AM, Seung Hee Cho wrote: > Thank you for much for your great help! > I am trying to use this tool but I am wondering if I can use this > CutAdapt tools on the public server. I was working on my job on the > public server, so if not I need download it for use. > I truly appreciate your help! > > Best, > > *Seung Hee Cho* > Contreras Research Group, CPE 5.416 > The University of Texas at Austin > Department of Chemical Engineering > 200 E Dean Keeton St. Stop C0400 > Austin, TX 78712-1589 > > > On Mon, Nov 25, 2013 at 10:08 AM, Jennifer Jackson <jen(a)bx.psu.edu > <mailto:jen@bx.psu.edu>> wrote: > > Thanks Peter for another option! > > Jen > Galaxy team > > > On 11/23/13 6:19 AM, Peter Cock wrote: > > On Fri, Nov 22, 2013 at 8:48 PM, Jennifer Jackson > <jen(a)bx.psu.edu <mailto:jen@bx.psu.edu>> wrote: > > Hi Seung Hee, > > I know we discussed this on the other list, but I didn't > point you to the > open development ticket to (potentially) extend the > functions of the "Cut" > tool. This is not being actively worked on right now, but > you can follow it > for updates if you want. > https://trello.com/c/CbFSHrU5 > > Others are still welcome to comment about what types of > solutions they might > have to offer. There is no specific tool to do this on > Main right now (or in > the Tool Shed, from my checks). http://usegalaxy.org/toolshed > > This tool of mine might do what Seung Hee wanted, > but I have not tried it on very large Illumina datasets: > > http://toolshed.g2.bx.psu.edu/view/peterjc/seq_primer_clip > > Regards, > > Peter > > > -- > Jennifer Hillman-Jackson > http://galaxyproject.org > > -- Jennifer Hillman-Jackson http://galaxyproject.org

1 0

Identifying Genes
by Loupe, Jacob M. 26 Nov '13

26 Nov '13

I am very new to Galaxy. We have performed a comparative analysis between the transcriptomes of different samples. We performed the analysis using Galaxy software (Tophat; CuffDiff; etc). What my PI has done is compiled a list of all the genes differentially expressed between each set, each in a separate excel sheet. So what I have is an excel spreadsheet with a list (usually around 300) of test id, gene id, and locus (ChrX:111111111-22222222222). Initially, we have been identifying each gene individually, one at a time, by pasting the locus into the UCSC browser. This works, but is incredibly tedious. There has to be a better way in Galaxy. I have tried making BED files out of the loci, but so far I have been unable to identify genes using galaxy. Can someone please explain how I can take my long list of loci and get gene names, ID, function, and possibly some downstream comparative ontologies to begin analyzing. Like I said, very new to Galaxy and genomics. Thanks very much

2 1

help for trim sequences
by Seung Hee Cho 25 Nov '13

25 Nov '13

Hi, I am a galazy user and I want to trim exact sequences (not the location) from 5' end. Is there any tool I can use for this? For example, *AATGATACGGCGACCACCG **AACACTGCGTTTGCTGGCTTTG*ATG >From this sequence, I want to remove *AATGATACGGCGACCACCG,* *so I can get "**AACACTGCGTTTGCTGGCTTTG*ATG" only. If I use trim sequences or FASTX trimmer, then it will be trimmed absolute position. It would be great help. Thank you so much! Best, *Seung Hee *

4 5

Read_group_tracking files are not enough for cummerbund visualisations on replicates?
by Aranday Cortes, Elihu 21 Nov '13

21 Nov '13

Hi, Cuffdiff in Galaxy produces "read_group_tracking" files. However, I wonder if these files are enough when using 'replicates=T' in cummeRbund or I also need "read_groups.info" and "run.info" files in myDir. I've tried 'replicates(cuff)' with the 'read_group_tracking' files in my directory and I get back an empty set, means that cummerbund is not using replicates (or I'm doing something wrong!). At this time I don't see other option that run cuffdiff locally. At least you have any suggestion or advice about how I can get 'info.files' from Galaxy. Any particular reason why Cuffdiff output files in Galaxy are not ALL the files described in the manual tool? Thanks for you help. Elihu

2 1

Cufflinks returned 0 value in all RPKMs
by Ly, Dao 21 Nov '13

21 Nov '13

Hi I have been trying to analyze a rat Solid SRA but I encountered a problem: cufflinks gave me 0 RPKM in all genes. Here is my workflow 1. Get data with EBI SRA: sent the fastaq file directly to galaxy 2. Fastaq groomer 3. Mapped with bowtie for Solid (paire-ended) with the built- in index rat rn5 as reference genome 4. Sam to Bam the bowtie mapping result 5. Cufflinks the bam file All RPKMs of gene expression and transcript expression have a 0 value even thought the RPKM status is OK. I used default setting for all jobs. Am I missing something? Any help, suggestion will be greatly appreciated. Thank you very much Best regards Dao

2 1

Quantmap output issue (chemical names are truncated and cannot be differentiable from each other)
by Hsieh, Jui-Hua (NIH/NIEHS) [E] 21 Nov '13

21 Nov '13

Hi, I have submitted chemicals to the QuantMap. The chemicals are successfully prepared by the "QuantMap Prep" and run by "QuantMap server". However, in the output files (dendrogram as well as the cluster information), all of the chemical names are truncated. Thus, some chemicals with identical truncated names cannot be differentiated from each other. I wonder if CIDs or full chemical names can be reported as well. Thank you, Jui-Hua

2 1

compute an expression on every row question
by Tobias Hohenauer 21 Nov '13

21 Nov '13

Hello, I am looking for the right way to do a computation using "text manipulation, compute an expression on every row". I have a table consisting of 20 columns and about 15.000 rows. Column 1 is my untreated or control and I would like to normalize every other column to this control by simple division. As a result I would like to obtain a table in which column 1 is set to a value of 1 for every row, and the corresponding values in the other columns resulting from column x / column 1. I tried something like c2/c1,c3/c1,c4/c1... on a small testfile but that results in all normalized values being put into one column in brackets rather than being put into individual columns. What would be the right parameters? I also thought about creating a workflow consisting of successive divisions of the columns but that feels rather complicated. Is there an easier way? Any help would be much appreciated! Best, Tobias -- Tobias Hohenauer, PhD GCNA, Disease Mechanism Research Core RIKEN Brain Science Institute 2-1 Hirosawa, Wako-shi 351-0198 Japan

2 1

new cuffdiff with readgroup problem
by Johanna Sandgren 19 Nov '13

19 Nov '13

Hi, I have tested the new cuffdiff version in galaxy and was very eager to now also get the replicate data for each test. Have anyone tried that yet and succeeded with replicates in downstream R package CummeRbund? Even though I successfully build database with the now 15 output files (including read group tracking), no error message, I can not plot anything with replicates=T. Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: near "from": syntax error) Anyone know the problem? Thanks, Johanna ...................................................................................................................................................... Johanna Sandgren, PhD Department of Oncology-Pathology CCK, Karolinska Institutet SE-171 76 Stockholm, Sweden +46-8-517 721 35 (office), +46-8- 321047(fax), +46-708 388476 (mobile)

2 5

Mean inner distance
by Jennifer Jackson 19 Nov '13

19 Nov '13

Hi Vanessa, Mean inner distance can be thought of as being the estimated gap left between the two ends of the paired reads. This is from the manual (also at the bottom of the Tophat tool form): > -r This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments > selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter > is required for paired end runs. In short, take the insert size, subtract the length of both sequenced ends, and that is the "Mean inner distance". Best, Jen Galaxy team On 11/18/13 10:08 AM, Vanessa Lattimore wrote: > Hi Jen, > I was hoping you would be able to clarify something for me. What is the mean inner distance between mate pairs in the TopHat tool? I can't find a consensus online on what it is and everything I have found seems to be guesses. > My data was run on the HiSeq, and the average insert size was ~200 prior to all adaptors being removed. Do I go off this to determine the inner distance or is there a pipeline of tools I need to use to accurately determine it? > Your help would be greatly appreciated! > Many regards, > > Vanessa. -- Jennifer Hillman-Jackson http://galaxyproject.org

1 0