I have been trying to analyze some recently acquired WGS reads
(re-sequencing with MiSeq) but I am having problems with both Picard and
GATK tools and I don't know where the problem is.
My fastq reads are already in the sanger/illumina 1.9 format, as
recognized by the FastQC tool. I have modified the attributes of the
read files from fastq to fastqsanger and successfully performed a BWA
mapping against my reference sequence.
I have then filtered the resulting SAM file with "NGS: SAM Tools, Filter
SAM" to have only paired-mapped reads and reordered the file with "NGS:
Picard, Reorder SAM/BAM", allowing the option Truncate sequence names
after first whitespace.
Since my reads are highly duplicated (from the FastQC output), I have
run the "NGS: Picard, Mark Duplicate reads" tool, obtaining the removal
of only 2 duplicated reads. I went on adding a Read Group with "NGS:
Picard, Add or Replace Groups" and starting the SNP calling with GATK
using the tool Realigner Target Creator. And here I have obtained an
empty file and I have started thinking something is wrong.
So, I have tried to perform the mapping again (as suggested by the GATK
wiki when someone got an empty file like me), running the same steps on
different sample reads, but I have always the same strange results from
the De-duplication step and the Realigner tool.
I think there is something wrong during the BWA mapping step, or even in
my fastq reads, but I cannot understand what it is.
And what is the read quality format accepted by Galaxy tools? I know
it's the PHRED+33, but how does it look like?
I did BWA mapping with both types and it worked, but maybe the problems
lies somewhere here.
I hope someone can help me!