Filter and trim FASTQ files ensuring consistent read pairingremove_adapters_and_normalise.rb
$output
$output.id
$__new_file_path__
$read_pattern1
$read_pattern2
#if $data.data_source_select == "predefined"
$GALAXY_DATA_INDEX_DIR/NGS/truseq_adapters.fasta
#else if $data.data_source_select == "user"
$data.NGS_adapters
#else
no_adapters
#end if
#if $options.advanced == "yes"
$options.filter_quality
$options.filter_threshold
$options.trim_quality
$options.trim_fraction
#else
20
90
20
95
#end if
$file1
$file1.name
$file2
$file2.name
This tools performs several actions on paired Next Generation Sequence (NGS) data.
Input Data
----------
* One or more pairs of NGS reads. The pairs can be specified in any order.
* One pattern that is specific to the 1st read of all pairs, and a second pattern that is specific to the 2nd read of all pairs.
.. class:: warningmark
Assumes pairs of NGS files are identically named apart from the pattern specifed.
* Adapter sequences to be screened from the reads. If non-standard adapters are required users can select an appropriate FASTA file from their History.
Actions
-------
* If adapters have been specified these short sequences will be removed from the NGS reads, cleaning up the data. This cleanup can take a significant amount of time, up 1 hour per pair of reads.
* Reads are filtered for quality (Phred score >= 20 for >90% of bases in a read). Reads that fail are discarded.
* Reads are trimmed based on quality (Phred score >= 20 across consecutive 95% of read length). Where base quality at the end of the read falls below this value the base is trimmed, resulting in a shorter read. Where trimming shortens the read to less than the specified length, the read is discarded.
* The two files are compared to ensure that only pairs of reads are retained. Unpaired orphan reads are discarded.
* If the read file is >3Gb in size it is shrunk so that 15 million reads are retained.
* The pair of read files are interleaved into a single file, containing both forward and reverse reads.
* Finally, the interleaved file is returned to the current History along with a report on the manipulations applied to all NGS reads.