Hi Kiran, This is a question best directed to the galaxy-dev list, which I have CC'd. On Apr 23, 2012, at 5:23 AM, Kiran Jaycee wrote:
Dear Nate,
Im deploying Galaxy in the cluster environment with 8 nodes and 96 cores. Im particularly interested in running NGS tools like BWA, BOWTIE etc in the cluster with Torque PBS. Just wondering if galaxy is capable of parallel processing of data. 1 user running BWA in 10 cores in which 10 instances of BWA in each core parallely processing the mapping of 250 GB of reads data on to the reference genome. Is this possible?
The BWA tool itself supports running multithreaded, see the threads option in the wrapper. Also, Galaxy can split a tool's inputs, run a tool many times, and merge the outputs. I am not sure whether this is possible or implemented with BWA, but I would encourage you to look in to whether this would work for your needs.
I have another issue as well. Im unable to finish FASTQ grooming of the reads of size 250GB even after 2 days of processing. Is there a way to circumvent the grooming step? or any way to make the grooming speed up?
Assuming this is on a local server, make sure you have set the config option set_metadata_externally = True. You don't need to run the groomer if your fastq file is valid fastqsanger. --nate
Your quick response will be really appreciated.
Thank you
Regards, Kiran Jaycee
Bioinformatician, BINET ELOGIC Technologies Pvt. Ltd. #1, 2nd Floor, 100 Ft. Ring Road, Kathriguppe, Banashankari III Stage, Bangalore - 560085 Phone : +91 (0)80 4080 3829 Website : www.elogic.co.in