November 2011 - galaxy-user - lists.galaxyproject.org

(no subject)
by Xiangming Ding 06 Nov '11

06 Nov '11

Hi galaxy I am a new user of galaxy. i met a problem and didnot find similar question in FAQ. I wanted to upload the data from DDBJ DRA dataset to galaxy through UTL method. The file is around 800M. However after uploading, the FASTQ file was just around 2M. So I wanted know whether it is possible to upload a large file to galaxy through URL method? or I should download the file to my pc and then uploading to galaxy through FTP method. Thanks xiangmimg

1 0

Bed to Wiggle?
by shamsher jagat 04 Nov '11

04 Nov '11

Is there an option to convert Bed file to wiggle file in Galaxy? Thanks

2 1

BWA install problem
by Christopher Callaway 04 Nov '11

04 Nov '11

Hi, I am trying to run Galaxy locally and downloaded BWA but I can't get it to run. Do I have to use 0.5.6 or can I use 0.5.9? The make command does not work with ver. 0.5.9. Thanks, Christopher W. Callaway University of Utah Dept. of Pediatrics Division of Neonatology 417 Wakara Way #2222 Salt Lake City, UT 84108

2 1

Question about Cuffdiff
by Alessia D 04 Nov '11

04 Nov '11

Hi, I am confused about the first line in cuffdiff (using Galaxy on the cloud, not sure if it's different for local instances). It reads: Transcripts:(choose a file)A transcript GTF file produced by cufflinks, cuffcompare, or other source. What file should I load here? I have 4 groups of 2-4 replicates each that I am comparing, and am using the "grouping" option that follows. Should I use one of the files produced by cufflinks? If so, which ones? Or should I rather use the GTF file of RefSeq genes from the UCSC database? Thx!!A

2 1

Removal of duplicates and Cufflinks usage
by Lizex Husselmann 04 Nov '11

04 Nov '11

Hi All I've started analyzing my RNA-Seq data for two time points: Day0 and Day4 for control and treated. I've done aligning the data to the reference genome using Tophat. I've removed duplicates from the data sets. Could somebody please tell me, how important is it to remove duplicates and how will it influence my results if I don't remove? I want to start with Cufflinks all the way through to Cuffdiff. Where do I start since there are just so many options (in the manual) to choose from? What do I look for? Kind regards Lizex Disclaimer This message is confidential and may be covered by legal professional privilege. It must not be read, copied, disclosed or used in any other manner by any person other than the addressee(s). Unauthorised use, disclosure or copying is strictly prohibited and may be unlawful. The views expressed in this email are those of the sender, unless otherwise stated. If you have received this email in error, please contact ARC Service Desk immediately. (mailto:Servicedesk@arc.agric.za) To report incidents of fraud and / or corruption in the ARC use our Ethics Hotline by: Phone number : 0800 21 20 56 Fax number : 0800 200 796 Email address : fraud(a)kpmg.co.za For more information on the ARC Ethics Hotline, please visit our website at www.arc.agric.za.

2 1

Re: [galaxy-user] Help with analysis
by Alex Daffornel 03 Nov '11

03 Nov '11

> > Brand new galaxy user here. > > I ran an RNA-seq Illumina experiment in which I compare cells from wild > type animals and cells from animals that have a deletion in a splicing > factor. Now I have my data in fastq format and need to do analysis to > figure out which transcripts are changed and how (see below). Problem is, I > have no idea whatsoever what to do. > > Can someone be so kind and write down a basic outline of analysis to > follow? > > My understanding from what I've been reading online is that once you have > fastq files you > > 1. Use FASTQ Groomer to convert to Sanger format > > 2. Evaluate the quality with FASTQ Summary Stats (and get boxplot of data) > > 3. Trim reads if their quality doesn't look good > What is considered "ok" quality? A score above 20? Is that the > mean score or the absolute score? How do I trim based on score only those > reads that have a low q score? (can I?) > > 4. Map the reads > What mapping software do you reccomend? BWA or Bowtie? Or Tophat? > What next? > Let's say that anything after the trimming is very fuzzy. > > > The questions I am interested in are > - What transcripts are upregulated / downregulated in mutant vs control ? > (I have 3 replicates of each) > - Are there introns that are retained in mutant (but not or less in > control)? > - Are there exons that are excluded in mutant (basically, I want to at > patterns of alternative splicing..) > > > Sorry for the very long message, but I have no idea who else to ask. > > Thanks!! > > > > > > >

1 0

cufflinks error
by jh yu 03 Nov '11

03 Nov '11

Dear all: Recently I am using cufflinks to analyze differential expression between different conditions, but when using cufflinks an error occurred: An error occurred running this job:cufflinks v1.0.3 cufflinks -q --no-update-check -I 1 -F 0.050000 -j 0.050000 -p 8 -g /galaxy/main_database/files/002/991/dataset_2991920.dat Error running cufflinks. [bam_header_read] EOF marker is absent. [bam_header_read] invalid BAM binary header (this However, when I used the same parameters to analyze another file, it worked well: 19,904 lines format: gtf, database: rhodRHA1 Info: cufflinks v1.0.3 cufflinks -q --no-update-check -I 1 -F 0.050000 -j 0.050000 -p 8 -g /galaxy/main_database/files/002/991/dataset_2991920.dat The only difference is the size of each file, the failed one input file is 23 G, while the succeeded one input file is 3.5 G, is the size causing failure? Thank you in advance. Best wishes! Sincerely, Jinhai YU Jinhai YU, Ph.D Candidate 010-64888521 Institute of Biophysics, Chinese Academy of Sciences, 15 Datun Road, Chaoyang District, Beijing, 100101, China

3 2

cufflinks set parameters
by dongdong zhaoweiming 03 Nov '11

03 Nov '11

Hi, It's the first time for me to use cufflinks in galaxy, two choices confused me as follows: Perform Bias Correction: No Yes Bias detection and correction can significantly improve accuracy of transcript abundance estimates. Set Parameters for Paired-end Reads? (not recommended): No Yes what is the Bias correction and how does it work? And for "set parameters for paired-end reads? (not recommended)" ,what would be important difference between recommended and not recommended? Thanks a lot! dongdong

2 1

data format
by Klaudyna Borewicz 03 Nov '11

03 Nov '11

Hi, I would like to use Galaxy to run LEfSe, but I don't know how to get the data into tabular format that is required (http://huttenhower.org/galaxy/tool_runner?tool_id=LEfSe_for) My data is 454 fasta files that I was analyzing with RDP to get the classification. It works fine, I get .txt file that i can load to Galaxy, it looks like this: norank Root 37646 unclassified_Root 9 domain Bacteria 37637 unclassified_Bacteria 5998 phylum OD1 0 unclassified_OD1 0 genus OD1_genera_incertae_sedis 0 phylum BRC1 0 unclassified_BRC1 0 genus BRC1_genera_incertae_sedis 0 phylum Deferribacteres 0 unclassified_"Deferribacteres" 0 class Deferribacteres 0 unclassified_Deferribacteres 0 order Deferribacterales 0 unclassified_Deferribacterales 0 family Deferribacterales_incertae_sedis 0 unclassified_Deferribacterales_incertae_sedis 0 genus Caldithrix 0 family Deferribacteraceae 0 unclassified_Deferribacteraceae 0 genus Calditerrivibrio 0 genus Mucispirillum 0 ..... but i need to have the labels in a hierarchical organization and I cannot find the way to get it to work. Please let me know if you have any suggestions, or maybe RDP is just not the way to go. Thank you and hope to hear from you soon, Klaudyna -- ____________________________________________________ Klaudyna Borewicz M.Sc, B.Sc Department of Veterinary and Biomedical Sciences University of Minnesota 1971 Commonwealth Ave. St. Paul, Minnesota 55108 Phone: (612)624-6226 FAX: (612)625-5203

2 1

fastx reverse compliment failed - gzip: stdout: Broken pipe
by Lucinda Lawson 02 Nov '11

02 Nov '11

Hi all, We are running Galaxy on an Ubuntu 11.10 computer (5 TB, stripped, etc.). We are assembling a small genome (110 Gb). Our dataset isn't directly uploaded, but is accessed from a directory (if that matters). Everything went fine through the FASTQ Groomer, but when we ran Reverse-Compliment, we got the following error: fastx_reverse_complement: writing quality scores failed: File too large gzip: stdout: Broken pipe Any help that you might have would be greatly appreciated! Thanks! -Lucinda

1 0