Filter and trim FASTQ files ensuring consistent read pairing remove_adapters_and_normalise.rb $output $output.id $__new_file_path__ $read_pattern1 $read_pattern2 #if $data.data_source_select == "predefined" $GALAXY_DATA_INDEX_DIR/NGS/truseq_adapters.fasta #else if $data.data_source_select == "user" $data.NGS_adapters #else no_adapters #end if #if $options.advanced == "yes" $options.filter_quality $options.filter_threshold $options.trim_quality $options.trim_fraction #else 20 90 20 95 #end if $file1 $file1.name $file2 $file2.name This tools performs several actions on paired Next Generation Sequence (NGS) data. Input Data ---------- * One or more pairs of NGS reads. The pairs can be specified in any order. * One pattern that is specific to the 1st read of all pairs, and a second pattern that is specific to the 2nd read of all pairs. .. class:: warningmark Assumes pairs of NGS files are identically named apart from the pattern specifed. * Adapter sequences to be screened from the reads. If non-standard adapters are required users can select an appropriate FASTA file from their History. Actions ------- * If adapters have been specified these short sequences will be removed from the NGS reads, cleaning up the data. This cleanup can take a significant amount of time, up 1 hour per pair of reads. * Reads are filtered for quality (Phred score >= 20 for >90% of bases in a read). Reads that fail are discarded. * Reads are trimmed based on quality (Phred score >= 20 across consecutive 95% of read length). Where base quality at the end of the read falls below this value the base is trimmed, resulting in a shorter read. Where trimming shortens the read to less than the specified length, the read is discarded. * The two files are compared to ensure that only pairs of reads are retained. Unpaired orphan reads are discarded. * If the read file is >3Gb in size it is shrunk so that 15 million reads are retained. * The pair of read files are interleaved into a single file, containing both forward and reverse reads. * Finally, the interleaved file is returned to the current History along with a report on the manipulations applied to all NGS reads.