I have a question about the NGS: Indel analysis and SNP Calling.
Assuming I have loaded my paired end reads, groomed, and got all the way through to alignment with BWA my question the becomes does the analysis for indel analysis and SNP analysis split in the work flow?
For SNP analysis, It seems that I need to filter on SAM, convert SAM-to-BAM, etc...
For Indel, It seem that I should use the BWA output that is in SAM format for indel analysis.
Are these two above statments correct?
I also have a question regarding the input for indel analysis. Should I use the BWA output directly (which is in SAM format) or should I first "filter on SAM" and use that output (which is also in SAM format).
I have tried the indel analysis using both filtered and unfilterd and I get very similar results. It seems to me that should use the "filtered on SAM" output where I can indicate that the reads are paired=Yes, proper pairs=yes, unmapped=NO.
Any thought, insight, etc.
Thanks if advance,
I am new to Galaxy and I am not sure whether these topics were discussed
earlier. I followed the steps up to cufflinks and I did not have any
problems. Thanks for the RNA seq tutorial. My questions are
1. How do I know the number of reads mapped against the reference genome
used after Top Hat mapping
2. I am aware that Cuffdiff is used to find the differences in expression.
How do I combine replicates (3) of different treatments ?
I am trying to use the tool "Compute quality
in galaxy on Ilumina single reads. The file is 2.3 Gb, fastq format. I have
performed Quality format converter on the data set and the format is now
qualillumina. Despite of this, galaxy don't recognize any dataset in the
workflow to use as input into quality statistics.
Any idea why my dataset is not accepted as input?
I currently use a galaxy server with cluster. This cluster uses SGE.
I'd like to specify a queue other than the default.
I have tried many combinaisons with drmaa:/// without success. The queue
used is always the default one.
Does anyone has solved this problem?
tel : +33 (0)5 61 28 54 27
INRA - Unité de BIA
Génopôle - plateforme Bio-informatique
Chemin de Borde-Rouge - AUZEVILLE
BP 52627 - 31326 CASTANET-TOLOSAN CEDEX
During the visualization of my mRNAseq data, some area have red line
indicating that only the first 5000 reads are displayed. Within this
region, some area have many reads. Some area, where exons exist, I
don't see any reads. How do I interpret the data? If there are no
reads shown in the visualization, although there is a read line saying
only the first 5000 reads are displayed, does the absence of reads
corresponding to a particular exon means it is not expressed?
Our current galaxy database is ~ 600 gb, most of which are user deleted
I followed the instructions here:
and ran the shell scripts in recommended order. One of them in particular (I
think it was purge_histories.sh) took amost 24 hours to complete. However,
it doesn't appear any / most of the files were actually deleted, since we
still have ~ 600 gb of dataset files. Is there something obvious I can try
to get the files purged correctly?
O'Connor Lab, WNPRC
555 Science Dr. Madison WI