July 2012 - galaxy-user - lists.galaxyproject.org

Initial QC and grooming for Illumina HiSeq2000 paired end RNAseq data
by Lindsey Kelly 11 Jul '12

11 Jul '12

I am trying to do RNAseq analysis on Paired end data from the Hiseq2000. I have about 50 files for each sample (25 forward and 25 reverse - although each sample has a different number of files). I think that I need to: -convert them into FASTQ sanger format using the FASTSQ groomer tool -check the quality using the FASTQqc tool I don't know how to handle this many files. Do I have to groom and run the QC for each file? Should I join the paired files and run both tools on each pair, or should I combine all of the data for each sample (which I don't know how to do) and then groom and run the QC for all of the reads for the sample. Thanks in advance for advice Lindsey

2 3

metagenomics question
by Jennifer Jackson 11 Jul '12

11 Jul '12

Hello, The publication and supplemental material for the metagenomics data and tools available in Galaxy described in: Windshield splatter analysis with the Galaxy metagenomic pipeline is available on the main public Galaxy instance at: Shared Data -> Shared Published Pages -> Windshield Splatter http://genome.cshlp.org/content/19/11/2144 http://main.g2.bx.psu.edu/u/aun1/p/windshield-splatter All methods and tools are explained in detail, including example datasets, histories, workflows, and scientific discussion of results. Hopefully this help. Going forward, please send new questions as a brand new thread (not as a reply to an older thread) directly to our mailing list at galaxy-user(a)bx.psu.edu. http://wiki.g2.bx.psu.edu/Support#Public_mailing_list_Q_.26_A_discussions Best, Jen Galaxy team On 7/7/12 1:33 AM, Swayamprakash Patel wrote:> Hello, > i had run galaxy server for metagenomics study... but, i would like > to know that which database is used for the comparison... because in my > sample it had gives me highest no. of eukaryotic community. but actually > in my data there would be a bacterial community is present in more > numbers. that's why i have a question like this. -- Jennifer Jackson http://galaxyproject.org

1 0

fetch codons and amino acid
by bingyu19821270 11 Jul '12

11 Jul '12

Hi all, I am a new Galaxy user and I have searched the mail list, looking for the answers to my questions, but failed. I am trying to fetch the corresponding codon or amino acid alignments among 46 species using genomic intervals in human. I know if I have a list of human genomic intervals, I can get the nucleotide alignments of these intervals among 46 species. I have hundreds of genomic intervals in human. They are all located in CDS regions. I already fetched the alignment among the 46 species for each genomic interval. The thing is that I also want to know the corresponding codons or amino acids. Can somebody help me out? Thanks a million! -------------------------------------------------------------------------------- Patricia Hsu Kunming Institute of botany, CAS bingyu19821270

2 1

Tophat
by Jennifer Jackson 10 Jul '12

10 Jul '12

Hello, Using the defaults and then testing the resulting SAM output seems to be what most folks are doing if they do not have access to the original library construction methods (e.g. size selection). Both SAM Tools and Picard are in Galaxy. This is a useful post where the options are discussed: http://www.biostars.org/post/show/16556/estimate-insert-size-in-paired-endm… Is the data Illumina? The data source may be able to tell you if the adapter sequence was actually sequenced and/or if it was removed already or not. If present or you just suspect it is present, they would also have access to the Illumina fasta adapter data. You could also test with FastQC (before or after alignment, maybe on just a sample), then perform a clip based on those results, and re-run. See the tools in 'NGS: QC and manipulation' to perform these tasks. Going forward, please send questions as a new thread directly to our mailing list at galaxy-user(a)bx.psu.edu. http://wiki.g2.bx.psu.edu/Support#Public_mailing_list_Q_.26_A_discussions Best, Jen Galaxy team On 7/10/12 5:36 AM, asma.bioinfo(a)gmail.com wrote: > Does anyone got correct answer, how to extract the correct distance between two pairs? > > One naive question, how can I find the adapter sequence length? > > Thanks! -- Jennifer Jackson http://galaxyproject.org

1 0

Re: [galaxy-user] Getting the installed data to show up in the sample files at run time
by Carlos Borroto 10 Jul '12

10 Jul '12

On Fri, Jul 6, 2012 at 3:07 AM, Aarti Desai <aarti_desai(a)persistent.co.in> wrote: > These show up in the history with the appropriate size. But when I choose > the “Map with BWA for Illumina” option, the two fastq files do not show up > in the FASTQ file drop down. Hi Aarti, Most tools in galaxy that work with fastq file need specifically fastqsanger input file. Even if your file have the quality values in Sanger format, the auto detect format logic won't be able to determine this. If you know you fastq file are in fact in Sanger format, editing your file metadata and selecting fastqsanger for the format would do the trick. If you aren't sure, you should use Fastq Groomer tool to convert them to Sanger format. I would recommend to watch the several screencast describing mapping analysis in Galaxy. Hope it helps, Carlos

1 0

"job is waiting to run" forever
by Yehoshua Enuka 09 Jul '12

09 Jul '12

Dear Sir/Madam, I am a registered user of the public Galaxy Server (main). Any job I submitted today is labeled as "Job is waiting to run" forever. Could you please let me know the possible reasons? Sincerely, Enuka

3 2

Indexing files everytime - Performance Issue
by Praveen Raj Somarajan 09 Jul '12

09 Jul '12

All, It is noticed that Galaxy/GATK indexes reference fasta & dbSNP file everytime when it runs. Re-indexing takes time (~10min), hence it affects overall run time when it use for multiple times. However, this could be avoided by reusing the available index. Here is the snapshot of the log: INFO 11:43:57,365 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.4-21-g30b937d, Compiled 2012/02/01 19:01:14 INFO 11:43:57,365 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 11:43:57,365 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki INFO 11:43:57,366 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa INFO 11:43:57,367 HelpFormatter - --------------------------------------------------------------------------------- INFO 11:43:57,429 GenomeAnalysisEngine - Strictness is STRICT INFO 11:43:57,432 ReferenceDataSource - Index file /tmp/tmp-gatk-6jlUfH/gatk_input.fasta.fai does not exist. Trying to create it now. PROGRESS UPDATE: file is 15 percent complete PROGRESS UPDATE: file is 28 percent complete PROGRESS UPDATE: file is 91 percent complete INFO 11:45:32,231 ReferenceDataSource - Dict file /tmp/tmp-gatk-6jlUfH/gatk_input.dict does not exist. Trying to create it now. INFO 11:45:54,262 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 11:45:54,280 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 INFO 11:45:54,304 RMDTrackBuilder - Creating Tribble index in memory for file /tmp/tmp-gatk-6jlUfH/input_dbsnp_0.vcf INFO 11:48:05,910 RMDTrackBuilder - Writing Tribble index to disk for file /tmp/tmp-gatk-6jlUfH/input_dbsnp_0.vcf.idx Do we have any option/alternate in Galaxy to avoid this re-indexing at /tmp, as I have already built the index for reference and dbSNP. Look forward to any suggestions. Thanks, Raj ________________________________ This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions that are unlawful. This e-mail may contain viruses. Ocimum Biosolutions has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system. OCIMUMBIO SOLUTIONS (P) LTD

2 1

An error in reporting a tool error...
by leah reshef 09 Jul '12

09 Jul '12

Hello all I am working with the LefSe module. Over the last two days, I am gettimg a recurring error in the "Plot LefSe result" step. LefSe analysis itself is performed OK (I can see the the results in a text browser) but when I want to plot them - I get a red error message. When I do "plot differential features" , I get a zip file but its empty. The "plot one feature" function DOES work; but since my datasets have hundreds of species, its quite difficult to find the relevant features one by one manually. The worst thing is, when I try to report this error - by pressing the "bug" icon in the red failed toolbox - i get this message: Mail is not configured for this galaxy instance So I CAN'T even report the error! Help, anyone?

2 1

Permission denied error when running fastqc
by Aarti Desai 09 Jul '12

09 Jul '12

Hello All, One more problem when running analysis on local galaxy install. I am trying to run fastqc on a fastq file I just imported. I have fastqc in ~/Programs/Galaxy/galaxy-dist/tool-data/shared/jars/FastQC I am getting the following error ## odpath=None: No output found in None. Output for the run was: /bin/sh: /root/Programs/Galaxy/galaxy-dist/tool-data/shared/jars/FastQC/fastqc: Permission denied My guess is the output directory path is not set. If my guess is correct, the question is where do I set the path? If my guess is wrong, any help interpreting the error greatly appreciated. Aarti Aarti Desai, Ph.D | Domain Specialist - Life Sciences aarti_desai(a)persistent.co.in<mailto:aarti_desai@persistent.co.in> | Cell: +91-9673009492 | Tel: + 91-20-67036348 Persistent Systems Ltd. | Partners in Innovation | www.persistentsys.com<http://www.persistentsys.com/> DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

2 4

HIDATA
by Daiene Santos 06 Jul '12

06 Jul '12

After running Cuffdiff, I got more than 95% of genes with HIDATA status. What should I do?

2 1