Hello Galaxy community,
I hope you are doing well. I am starting a new thread to ask for advice regarding handling large FASTQ files efficiently in Galaxy.
Context:
I am currently working with Galaxy version 23.1 on a public Galaxy server and running a workflow for RNA-seq data analysis. My workflow includes quality control, trimming, alignment, and feature counting using tools such as FastQC, Cutadapt, HISAT2, and featureCounts. The FASTQ files are relatively large (around 20–30 GB each), and I am experiencing long runtimes and occasional job failures during the alignment step.
I have reviewed the Galaxy documentation and searched through previous mailing list discussions, but I am still unclear about the best practices for managing large datasets efficiently within Galaxy, especially in terms of resource usage and workflow optimization.
Question:
Could anyone please advise on recommended strategies for processing large FASTQ files in Galaxy? For example, are there specific settings, tools, or workflow adjustments that help improve performance and stability? Additionally, would using data collections or alternative aligners be beneficial in this case?
Any guidance, examples, or references would be greatly appreciated.
Thank you very much for your time and support.
Best regards,
Nguyen Van A
University of Florida
https://animalcraft.bitbucket.io