Re: [galaxy-dev] Help Needed
Hello Elsayed, Your protocol below seems to be a mix of a variant detection and an RNA-seq workflow. To build a workflow for RNA-seq, you will want to compare your steps with the protocols in the link that I sent you. If you want more examples, many more can be found here (for both RNA-seq and variant analysis, these are distinct analysis). Some have workflows that you can import and use directly or modify. https://wiki.galaxyproject.org/Learn#Other_Tutorials Shared Data -> Published Pages In particular, there is a sample workflow in this tutorial that will help you to understand what the different steps are for and what order they go in. https://usegalaxy.org/u/jeremy/p/galaxy-rna-seq-analysis-exercise Best, Jen Galaxy team On 1/3/14 10:57 AM, Elsayed Hejazy wrote:
*Dear Dr. Jennifer,* A lot of thanks for your reply its really mean alot for me. i am still NGS junior i trying to do the following steps kindly give me the right order for these procedures Data Description: i have to samples each sample consists of 14 FASTQ file (7 forward and 7 reverse ) i think this mean its paired end from Illumina then i will try the following workflow to got best results 1- Drag tow input dataset into workflow one for forward sequences file and one for reverse to use paired end option in TOPHAT tool later and when i run this workflow i will select multiple selection for the 7 forward files to analyse all of them at the same time 2- Drag FASTQC and link with last step for each to got if these file may be illmina 1.8 version or older. 3- Drag FASTQ Groomer and link with last step if files older than 1.8 version to prepare as .FASTQSANGER format. 4- Drag Filter FASTQ and link with last step to remove redundancy of sequences. 5- Drag FASTQ trimmer to remove unwanted ends of sequenced may occur 6- Drag Manipulate FASTQ and link with last step (i dont know why). the above six steps done twice to generate to files as output to make as input for the following steps. 7- Drag TOPHAT for illumina and make it accept paired end files and link each file generated from QC to TOPHAT this step used to align and map with reference genome. 8- Drag Cufflinks and link with aligned BAM file generated from TOPHAT to create an assembly 9- Drag Cuffmerge and link with GTF file from Cufflinks this step to merge all assemblies generated in Cufflinks 10- Drag Cuffcompare and link with last step to got detailed reports for accuracy of all generated assemblies. 11- Drag MPileup and link with TOPHAT BAM file to generate file containing SNPs sites. 12- Drag Pileup-to-Interval and link with MPileup step to filter the number of output SNPs to successive one or the most accurate. (i dont know what is the difference between this tool and Filter Pileup). - i dont know what is the tools used to know the copy number variation CNV i need to know how to separate human sequences from sample may infected with any other sequences (is this at alignment stage) i need to know the perfect order of steps if this order is not completely right.
Is this what i should do to make a good NGS workflow got all possible information form dataset
Really i am so so so sorry for disturbance - waiting your reply Best Regards, elsayed
On Thu, Jan 2, 2014 at 6:05 PM, Jennifer Jackson <jen@bx.psu.edu <mailto:jen@bx.psu.edu>> wrote:
Hello Elsayed,
Protocol help for RNA-seq analysis can be found here: https://wiki.galaxyproject.org/Support#Tools_on_the_Main_server:_RNA-seq
The QA/QC steps should be done before mapping, on individual datasets (such as replicates) or on partial or merged datasets as needed (if that is how the data was sequenced, just be consistent). Only keep in mind that the larger the dataset, the more compute some of these steps can require.
Hopefully this helps,
Jen Galaxy team
On 1/1/14 1:32 AM, Elsayed Hejazy wrote:
i need to know more about the order of steps for RNA Seq data analysis Is alignment and assembly should done first to combine all FASTQ reads into one file and start analysis or i should do Quality Control ,filtering , trimming and manipulation first and do alignment and assembly at the end of analysis to use tophat for example or any other analysis tools in Galaxy
Thank you elsayed
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Jennifer Hillman-Jackson http://galaxyproject.org
-- Jennifer Hillman-Jackson http://galaxyproject.org
Hi Elsayed At the risk of overwhelming you with examples: The Genomics Virtual Lab also provides basic and advanced RNA-seq and Variant Detection tutorials; with a (small) public Galaxy instance specifically aimed at running these two tutorials. https://genome.edu.au/wiki/Learn Ron Horst Project Manager, Genomics Virtual Lab Level 6, QBP, The University of Queensland, QLD 4072 t. 07 3346 2276 | m. 0417 538 723 | e. r.horst@uq.edu.au | w. www.genome.edu.au<http://www.genome.edu.au/> From: galaxy-dev-bounces@lists.bx.psu.edu [mailto:galaxy-dev-bounces@lists.bx.psu.edu] On Behalf Of Jennifer Jackson Sent: Saturday, 4 January 2014 6:59 AM To: Elsayed Hejazy Cc: Galaxy Dev Subject: Re: [galaxy-dev] Help Needed Hello Elsayed, Your protocol below seems to be a mix of a variant detection and an RNA-seq workflow. To build a workflow for RNA-seq, you will want to compare your steps with the protocols in the link that I sent you. If you want more examples, many more can be found here (for both RNA-seq and variant analysis, these are distinct analysis). Some have workflows that you can import and use directly or modify. https://wiki.galaxyproject.org/Learn#Other_Tutorials Shared Data -> Published Pages In particular, there is a sample workflow in this tutorial that will help you to understand what the different steps are for and what order they go in. https://usegalaxy.org/u/jeremy/p/galaxy-rna-seq-analysis-exercise Best, Jen Galaxy team On 1/3/14 10:57 AM, Elsayed Hejazy wrote: Dear Dr. Jennifer, A lot of thanks for your reply its really mean alot for me. i am still NGS junior i trying to do the following steps kindly give me the right order for these procedures Data Description: i have to samples each sample consists of 14 FASTQ file (7 forward and 7 reverse ) i think this mean its paired end from Illumina then i will try the following workflow to got best results 1- Drag tow input dataset into workflow one for forward sequences file and one for reverse to use paired end option in TOPHAT tool later and when i run this workflow i will select multiple selection for the 7 forward files to analyse all of them at the same time 2- Drag FASTQC and link with last step for each to got if these file may be illmina 1.8 version or older. 3- Drag FASTQ Groomer and link with last step if files older than 1.8 version to prepare as .FASTQSANGER format. 4- Drag Filter FASTQ and link with last step to remove redundancy of sequences. 5- Drag FASTQ trimmer to remove unwanted ends of sequenced may occur 6- Drag Manipulate FASTQ and link with last step (i dont know why). the above six steps done twice to generate to files as output to make as input for the following steps. 7- Drag TOPHAT for illumina and make it accept paired end files and link each file generated from QC to TOPHAT this step used to align and map with reference genome. 8- Drag Cufflinks and link with aligned BAM file generated from TOPHAT to create an assembly 9- Drag Cuffmerge and link with GTF file from Cufflinks this step to merge all assemblies generated in Cufflinks 10- Drag Cuffcompare and link with last step to got detailed reports for accuracy of all generated assemblies. 11- Drag MPileup and link with TOPHAT BAM file to generate file containing SNPs sites. 12- Drag Pileup-to-Interval and link with MPileup step to filter the number of output SNPs to successive one or the most accurate. (i dont know what is the difference between this tool and Filter Pileup). - i dont know what is the tools used to know the copy number variation CNV i need to know how to separate human sequences from sample may infected with any other sequences (is this at alignment stage) i need to know the perfect order of steps if this order is not completely right. Is this what i should do to make a good NGS workflow got all possible information form dataset Really i am so so so sorry for disturbance - waiting your reply Best Regards, elsayed On Thu, Jan 2, 2014 at 6:05 PM, Jennifer Jackson <jen@bx.psu.edu<mailto:jen@bx.psu.edu>> wrote: Hello Elsayed, Protocol help for RNA-seq analysis can be found here: https://wiki.galaxyproject.org/Support#Tools_on_the_Main_server:_RNA-seq The QA/QC steps should be done before mapping, on individual datasets (such as replicates) or on partial or merged datasets as needed (if that is how the data was sequenced, just be consistent). Only keep in mind that the larger the dataset, the more compute some of these steps can require. Hopefully this helps, Jen Galaxy team On 1/1/14 1:32 AM, Elsayed Hejazy wrote: i need to know more about the order of steps for RNA Seq data analysis Is alignment and assembly should done first to combine all FASTQ reads into one file and start analysis or i should do Quality Control ,filtering , trimming and manipulation first and do alignment and assembly at the end of analysis to use tophat for example or any other analysis tools in Galaxy Thank you elsayed ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org -- Jennifer Hillman-Jackson http://galaxyproject.org
participants (2)
-
Jennifer Jackson
-
Ron Horst