I am trying to do RNAseq analysis on Paired end data from the Hiseq2000. I
have about 50 files for each sample (25 forward and 25 reverse - although
each sample has a different number of files).
I think that I need to:
-convert them into FASTQ sanger format using the FASTSQ groomer tool
-check the quality using the FASTQqc tool
I don't know how to handle this many files. Do I have to groom and run the
QC for each file? Should I join the paired files and run both tools on each
pair, or should I combine all of the data for each sample (which I don't
know how to do) and then groom and run the QC for all of the reads for the
Thanks in advance for advice
The publication and supplemental material for the metagenomics data and
tools available in Galaxy described in:
Windshield splatter analysis with the Galaxy metagenomic pipeline
is available on the main public Galaxy instance at:
Shared Data -> Shared Published Pages -> Windshield Splatter
All methods and tools are explained in detail, including example
datasets, histories, workflows, and scientific discussion of results.
Hopefully this help. Going forward, please send new questions as a brand
new thread (not as a reply to an older thread) directly to our mailing
list at galaxy-user(a)bx.psu.edu.
On 7/7/12 1:33 AM, Swayamprakash Patel wrote:> Hello,
> i had run galaxy server for metagenomics study... but, i would like
> to know that which database is used for the comparison... because in my
> sample it had gives me highest no. of eukaryotic community. but actually
> in my data there would be a bacterial community is present in more
> numbers. that's why i have a question like this.
I am a new Galaxy user and I have searched the mail list, looking for the answers to my questions, but failed.
I am trying to fetch the corresponding codon or amino acid alignments among 46 species using genomic intervals in human.
I know if I have a list of human genomic intervals, I can get the nucleotide alignments of these intervals among 46 species. I have hundreds of genomic intervals in human. They are all located in CDS regions. I already fetched the alignment among the 46 species for each genomic interval. The thing is that I also want to know the corresponding codons or amino acids. Can somebody help me out?
Thanks a million!
Kunming Institute of botany, CAS
Using the defaults and then testing the resulting SAM output seems to be
what most folks are doing if they do not have access to the original
library construction methods (e.g. size selection). Both SAM Tools and
Picard are in Galaxy. This is a useful post where the options are discussed:
Is the data Illumina? The data source may be able to tell you if the
adapter sequence was actually sequenced and/or if it was removed already
or not. If present or you just suspect it is present, they would also
have access to the Illumina fasta adapter data. You could also test with
FastQC (before or after alignment, maybe on just a sample), then perform
a clip based on those results, and re-run. See the tools in 'NGS: QC and
manipulation' to perform these tasks.
Going forward, please send questions as a new thread directly to our
mailing list at galaxy-user(a)bx.psu.edu.
On 7/10/12 5:36 AM, asma.bioinfo(a)gmail.com wrote:
> Does anyone got correct answer, how to extract the correct distance between two pairs?
> One naive question, how can I find the adapter sequence length?
On Fri, Jul 6, 2012 at 3:07 AM, Aarti Desai
> These show up in the history with the appropriate size. But when I choose
> the “Map with BWA for Illumina” option, the two fastq files do not show up
> in the FASTQ file drop down.
Most tools in galaxy that work with fastq file need specifically
fastqsanger input file. Even if your file have the quality values in
Sanger format, the auto detect format logic won't be able to determine
this. If you know you fastq file are in fact in Sanger format, editing
your file metadata and selecting fastqsanger for the format would do
the trick. If you aren't sure, you should use Fastq Groomer tool to
convert them to Sanger format.
I would recommend to watch the several screencast describing mapping
analysis in Galaxy.
Hope it helps,
I am a registered user of the public Galaxy Server (main). Any job I
submitted today is labeled as "Job is waiting to run" forever. Could you
please let me know the
It is noticed that Galaxy/GATK indexes reference fasta & dbSNP file everytime when it runs. Re-indexing takes time (~10min), hence it affects overall run time when it use for multiple times. However, this could be avoided by reusing the available index. Here is the snapshot of the log:
INFO 11:43:57,365 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.4-21-g30b937d, Compiled 2012/02/01 19:01:14
INFO 11:43:57,365 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 11:43:57,365 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
INFO 11:43:57,366 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
INFO 11:43:57,367 HelpFormatter - ---------------------------------------------------------------------------------
INFO 11:43:57,429 GenomeAnalysisEngine - Strictness is STRICT
INFO 11:43:57,432 ReferenceDataSource - Index file /tmp/tmp-gatk-6jlUfH/gatk_input.fasta.fai does not exist. Trying to create it now.
PROGRESS UPDATE: file is 15 percent complete
PROGRESS UPDATE: file is 28 percent complete
PROGRESS UPDATE: file is 91 percent complete
INFO 11:45:32,231 ReferenceDataSource - Dict file /tmp/tmp-gatk-6jlUfH/gatk_input.dict does not exist. Trying to create it now.
INFO 11:45:54,262 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 11:45:54,280 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
INFO 11:45:54,304 RMDTrackBuilder - Creating Tribble index in memory for file /tmp/tmp-gatk-6jlUfH/input_dbsnp_0.vcf
INFO 11:48:05,910 RMDTrackBuilder - Writing Tribble index to disk for file /tmp/tmp-gatk-6jlUfH/input_dbsnp_0.vcf.idx
Do we have any option/alternate in Galaxy to avoid this re-indexing at /tmp, as I have already built the index for reference and dbSNP.
Look forward to any suggestions.
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions that are unlawful. This e-mail may contain viruses. Ocimum Biosolutions has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment.
The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.
OCIMUMBIO SOLUTIONS (P) LTD
I am working with the LefSe module. Over the last two days, I am gettimg a
recurring error in the "Plot LefSe result" step. LefSe analysis itself is
performed OK (I can see the the results in a text browser) but when I want
to plot them - I get a red error message. When I do "plot differential
features" , I get a zip file but its empty. The "plot one feature"
function DOES work; but since my datasets have hundreds of species, its
quite difficult to find the relevant features one by one manually.
The worst thing is, when I try to report this error - by pressing the "bug"
icon in the red failed toolbox - i get this message:
Mail is not configured for this galaxy instance
So I CAN'T even report the error!
One more problem when running analysis on local galaxy install. I am trying to run fastqc on a fastq file I just imported. I have fastqc in ~/Programs/Galaxy/galaxy-dist/tool-data/shared/jars/FastQC
I am getting the following error
## odpath=None: No output found in None. Output for the run was:
/bin/sh: /root/Programs/Galaxy/galaxy-dist/tool-data/shared/jars/FastQC/fastqc: Permission denied
My guess is the output directory path is not set. If my guess is correct, the question is where do I set the path?
If my guess is wrong, any help interpreting the error greatly appreciated.
Aarti Desai, Ph.D | Domain Specialist - Life Sciences
aarti_desai(a)persistent.co.in<mailto:firstname.lastname@example.org> | Cell: +91-9673009492 | Tel: + 91-20-67036348
Persistent Systems Ltd. | Partners in Innovation | www.persistentsys.com<http://www.persistentsys.com/>
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.