mitochondrial transcripts
Hello Tim, The problem is either that the reference genome build used does not contain the mitochondrial genome sequence, that the reference annotation file that you are using with the Cufflinks does not include mitochondrial annotation, or that the identifiers between the reference genome and the reference annotation are a mismatch (one is labeled "M" and the other "chrM" or possibly "MT"). Some links to the build: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/ http://www.ncbi.nlm.nih.gov/genome?term=mus%20musculus (open "Representative" to see the Mito as "MT", it is not listed on either link under chromosomes.) If you pulled out the fasta sequence yourself and are using it as a custom reference genome on the public Main Galaxy instance at https://main.g2.bx.psu.edu/ (usegalaxy.org) or installed it locally, then it could be missing the mito chromosome/genome. If you are using another public or private/local Galaxy (not Galaxy Main), you can contact them to find out how the genome was created (if it included the mito). For either of the two above - you could also run a tool like "NGS: Picard (beta -> BAM Index Statistics" to see if the mito chromosome is present in the Tophat output and what it is named. If included, compare the chromosome identifier/name to the reference annotation file your are using. The identifiers must be exact or tools will not function properly. More help, please see: http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server An alternative is to use either the mm9 or mm10 genome at the public Galaxy Main Galaxy instance. The variants, described on the "NGS: Mapping -> Bowtie" tool form, contain the mito chromosome named as "chrM". These are sourced from UCSC: http://genome.ucsc.edu/goldenPath/credits.html#mouse_credits also navigate to "Genomes -> mammal -> mouse -> mm10" to see more details Hopefully this helps, Jen Galaxy team On 2/19/13 9:43 AM, Brown, Tim wrote:
I have RNA-seq data that I have run through Tophat using the mouse mm10 grc38 reference genome. When I use Cufflinks on these data, the mitochondrial genome transcripts are excluded from the analysis. I would like to see that data. Is there an option within the toolbox that can remedy this? I'm a newb here.
Thanks for any help, Tim
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
Hi Jen, Thanks for the reply. I am using the Galaxy Main instance, and simply chose the mm10 reference genome from the pull-down option list. The BAM index statistics on one of my Tophat files reports that it has a large number of reads that are aligned to to chrMŠ so at least the alignments are there. I used the same mm10 reference genome with Cufflinks. Since I didn't use a custom genome, it seems as though there should not be any identifier mismatchesŠ(?) Tim On 2/19/13 1:54 PM, "Jennifer Jackson" <jen@bx.psu.edu> wrote:
Hello Tim,
The problem is either that the reference genome build used does not contain the mitochondrial genome sequence, that the reference annotation file that you are using with the Cufflinks does not include mitochondrial annotation, or that the identifiers between the reference genome and the reference annotation are a mismatch (one is labeled "M" and the other "chrM" or possibly "MT").
Some links to the build: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/ http://www.ncbi.nlm.nih.gov/genome?term=mus%20musculus (open "Representative" to see the Mito as "MT", it is not listed on either link under chromosomes.)
If you pulled out the fasta sequence yourself and are using it as a custom reference genome on the public Main Galaxy instance at https://main.g2.bx.psu.edu/ (usegalaxy.org) or installed it locally, then it could be missing the mito chromosome/genome.
If you are using another public or private/local Galaxy (not Galaxy Main), you can contact them to find out how the genome was created (if it included the mito).
For either of the two above - you could also run a tool like "NGS: Picard (beta -> BAM Index Statistics" to see if the mito chromosome is present in the Tophat output and what it is named. If included, compare the chromosome identifier/name to the reference annotation file your are using. The identifiers must be exact or tools will not function properly. More help, please see: http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server
An alternative is to use either the mm9 or mm10 genome at the public Galaxy Main Galaxy instance. The variants, described on the "NGS: Mapping -> Bowtie" tool form, contain the mito chromosome named as "chrM". These are sourced from UCSC: http://genome.ucsc.edu/goldenPath/credits.html#mouse_credits also navigate to "Genomes -> mammal -> mouse -> mm10" to see more details
Hopefully this helps,
Jen Galaxy team
On 2/19/13 9:43 AM, Brown, Tim wrote:
I have RNA-seq data that I have run through Tophat using the mouse mm10 grc38 reference genome. When I use Cufflinks on these data, the mitochondrial genome transcripts are excluded from the analysis. I would like to see that data. Is there an option within the toolbox that can remedy this? I'm a newb here.
Thanks for any help, Tim
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
Hi Tim, This helps to explain what data you are using, thanks for writing back. Are you using a reference annotation file with Cufflinks? Is it from mm10 (not mm9 or another mouse build?). This is where the build and/or identifiers can be a mismatch - between the genome used for mapping and the gtf/gff annotation input. They have to be an exact match - both in coordinates and identifiers. For mito I wouldn't expect build to be an issue (this record didn't change to my knowledge), but it doesn't hurt to check. If from mm9 or another build, you need to find one for mm10. If from mm10 or GRC38, make sure that it contains annotation from the mito genome if you are limiting assembly to the supplied reference annotation. Also, if from a source like Ensembl, and you just need to rename your gtf file's identifiers (in the mm10 genome on Main, the mito chrom is named "chrM"), this is an example of how it can be done: https://main.g2.bx.psu.edu/u/jeremy/p/transcriptome-analysis-faq#faq5 If this doesn't solve the problem, you can share a history with me. Under the History panel (far right) is a gear icon that opens a pull-down menu, select 'Share or Publish', and on the next page click on the first button to create a share link. Copy and paste that in a reply email to just me (don't cc the mailing list to keep your data private), and I can take a look. Please note which Cufflinks dataset you are concerned about (one of the outputs by #). Make sure that all inputs and outputs in the analysis are undeleted (undelete before sending to me if necessary). Hopefully you can find the problem, but I will watch for your email if not, Jen Galaxy team On 2/19/13 12:28 PM, Brown, Tim wrote:
Hi Jen,
Thanks for the reply. I am using the Galaxy Main instance, and simply chose the mm10 reference genome from the pull-down option list. The BAM index statistics on one of my Tophat files reports that it has a large number of reads that are aligned to to chrMŠ so at least the alignments are there. I used the same mm10 reference genome with Cufflinks. Since I didn't use a custom genome, it seems as though there should not be any identifier mismatchesŠ(?)
Tim
On 2/19/13 1:54 PM, "Jennifer Jackson" <jen@bx.psu.edu> wrote:
Hello Tim,
The problem is either that the reference genome build used does not contain the mitochondrial genome sequence, that the reference annotation file that you are using with the Cufflinks does not include mitochondrial annotation, or that the identifiers between the reference genome and the reference annotation are a mismatch (one is labeled "M" and the other "chrM" or possibly "MT").
Some links to the build: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/mouse/ http://www.ncbi.nlm.nih.gov/genome?term=mus%20musculus (open "Representative" to see the Mito as "MT", it is not listed on either link under chromosomes.)
If you pulled out the fasta sequence yourself and are using it as a custom reference genome on the public Main Galaxy instance at https://main.g2.bx.psu.edu/ (usegalaxy.org) or installed it locally, then it could be missing the mito chromosome/genome.
If you are using another public or private/local Galaxy (not Galaxy Main), you can contact them to find out how the genome was created (if it included the mito).
For either of the two above - you could also run a tool like "NGS: Picard (beta -> BAM Index Statistics" to see if the mito chromosome is present in the Tophat output and what it is named. If included, compare the chromosome identifier/name to the reference annotation file your are using. The identifiers must be exact or tools will not function properly. More help, please see: http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server
An alternative is to use either the mm9 or mm10 genome at the public Galaxy Main Galaxy instance. The variants, described on the "NGS: Mapping -> Bowtie" tool form, contain the mito chromosome named as "chrM". These are sourced from UCSC: http://genome.ucsc.edu/goldenPath/credits.html#mouse_credits also navigate to "Genomes -> mammal -> mouse -> mm10" to see more details
Hopefully this helps,
Jen Galaxy team
On 2/19/13 9:43 AM, Brown, Tim wrote:
I have RNA-seq data that I have run through Tophat using the mouse mm10 grc38 reference genome. When I use Cufflinks on these data, the mitochondrial genome transcripts are excluded from the analysis. I would like to see that data. Is there an option within the toolbox that can remedy this? I'm a newb here.
Thanks for any help, Tim
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
participants (2)
-
Brown, Tim
-
Jennifer Jackson