Dear Sir or Madam, I hope this reaches you well. Lately, I have been trying to use tophat and then use bowtie on Galaxy project to create an aligned BAM file. The original data came from a SRA file that I have acquired from the Japanese DNA Databank. This SRA was then converted to FASTQ using the tools available on Galaxy project. Now when I go under Tophat on Galaxy Project, I am unable to select the converted RNA-Seq FASTQ file. I was wondering, is there a specific format for the file to be in. Currently it is just a *.fastq file. I am confused as to why I am not being able to select the FASTQ file. Also if there is a guide on how to use Galaxy Project to create an aligned BAM file and then check for expression through Cufflinks package. I would really appreciate it. Sincerely, Zain
Hi Zain, I believe we already worked out the .fastqsanger/grooming part of this question in another thread. But for others reading this post, this is a help link: See "FASTQ" http://wiki.galaxyproject.org/Support#Dataset_special_cases Our RNA-exercise covers and example workflow: https://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise Best, Jen Galaxy team On 5/3/13 8:59 PM, Zain A Alvi wrote:
Dear Sir or Madam,
I hope this reaches you well. Lately, I have been trying to use tophat and then use bowtie on Galaxy project to create an aligned BAM file. The original data came from a SRA file that I have acquired from the Japanese DNA Databank. This SRA was then converted to FASTQ using the tools available on Galaxy project. Now when I go under Tophat on Galaxy Project, I am unable to select the converted RNA-Seq FASTQ file. I was wondering, is there a specific format for the file to be in. Currently it is just a *.fastq file. I am confused as to why I am not being able to select the FASTQ file.
Also if there is a guide on how to use Galaxy Project to create an aligned BAM file and then check for expression through Cufflinks package. I would really appreciate it.
Sincerely,
Zain
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
Hi Jen, Thank you for the information regarding the FastQ information. It was really helpful. Lately, I have been getting the following error: "Error getting history update from this server- Bad Gateway". This occurred after I tried to reupload some pre-aligned/ and indexed BAM files from NCBI GEO because I was hoping to generate and retrieve FPKM/RPKM values from them. Unfortunately, the my old files are still not available on Galaxy and I get an Internal Server Error when trying to retrieve them. Although I can get the work flow for them. The last weird error is that when I use Cuffdiff, I get FPKM of 0 with p/q values of 1 all the time. When this should not be the case as the BAM files are from two different organs. This is for every single gene, hence this indicates that something is wrong. I was able to retrieve the GTF file from UCSC main with the following settings: Insect - D. pseuddobscura Group - Genes and Gene Prediction Tracks Track: Flybase Table FlybaseGene Output format: GTF. I was wondering should these setting be fine or should I change the Group to mRNA or some other settings. Although the one that is avilable on UCSC is old dp3 file from 2004. The latest GFF is 3.1 on Flybase. I was wondering anyway to convert to a GTF file. Sorry for so many questions. Thank you again for the great help. Sincerely, Zain ________________________________ From: Jennifer Jackson [jen@bx.psu.edu] Sent: Tuesday, May 07, 2013 3:21 PM To: Zain A Alvi Cc: galaxy-dev@bx.psu.edu Subject: Re: [galaxy-dev] Tophat problem Hi Zain, I believe we already worked out the .fastqsanger/grooming part of this question in another thread. But for others reading this post, this is a help link: See "FASTQ" http://wiki.galaxyproject.org/Support#Dataset_special_cases Our RNA-exercise covers and example workflow: https://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise Best, Jen Galaxy team On 5/3/13 8:59 PM, Zain A Alvi wrote: Dear Sir or Madam, I hope this reaches you well. Lately, I have been trying to use tophat and then use bowtie on Galaxy project to create an aligned BAM file. The original data came from a SRA file that I have acquired from the Japanese DNA Databank. This SRA was then converted to FASTQ using the tools available on Galaxy project. Now when I go under Tophat on Galaxy Project, I am unable to select the converted RNA-Seq FASTQ file. I was wondering, is there a specific format for the file to be in. Currently it is just a *.fastq file. I am confused as to why I am not being able to select the FASTQ file. Also if there is a guide on how to use Galaxy Project to create an aligned BAM file and then check for expression through Cufflinks package. I would really appreciate it. Sincerely, Zain ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
Hi Jen,
Thank you for the information regarding the FastQ information. It was really helpful.
Lately, I have been getting the following error: "Error getting history update from this server- Bad Gateway". This occurred after I tried to reupload some pre-aligned/ and indexed BAM files from NCBI GEO because I was hoping to generate and retrieve FPKM/RPKM values from them. This has now been resolved, very sorry for the confusion it caused.
Unfortunately, the my old files are still not available on Galaxy and I get an Internal Server Error when trying to retrieve them. Although I can get the work flow for them. Same, resolved now.
The last weird error is that when I use Cuffdiff, I get FPKM of 0 with p/q values of 1 all the time. When this should not be the case as the BAM files are from two different organs. This is for every single gene, hence this indicates that something is wrong. I was able to retrieve the GTF file from UCSC main with the following settings:
Insect - D. pseuddobscura Group - Genes and Gene Prediction Tracks Track: Flybase Table FlybaseGene Output format: GTF.
I was wondering should these setting be fine or should I change the Group to mRNA or some other settings. Although the one that is avilable on UCSC is old dp3 file from 2004. The latest GFF is 3.1 on Flybase. I was wondering anyway to convert to a GTF file. I can't recommend a conversion tool, but there are a few on the web that could be tested out, if you decide to go that route. I do know that certain GFF3 files directly from FLYBASE have been problematic with the RNA-seq tools due to duplicated "ID" attributes. I don't know if this is all versions or not, or just the dm3 version. That said, the issue has been isolated to a few records (a gene mapping to >1 location), and
Hi Zain, On 5/19/13 1:35 PM, Zain A Alvi wrote: there isn't any reason why you shouldn't test out the /D. pseuddobscura/ version and then adjust it, if needed. The GTF file from the UCSC Table browser is correct, but Cuffdiff is looking for attributes that this version of the file does not have. If you look at the 9th field of the file to examine these attributes and compare it to the Cuffdiff input documentation, you can see how these differ. The gene_id and transcript_id are the same value and other attributes are not present such as tss_id and p_id. There is nothing wrong with the file, but without these attributes populated a particular way, certain calculations will not be done. http://cufflinks.cbcb.umd.edu/manual.html These variations are just different projects following a slightly different file specification. Some are content variations, some are format variations. This is common with this file type family (GFF, GTF, GFF3). This is why iGenomes creates files specifically for certain genomes for use with this tool set. When you do obtain a file that has the format and content you want to use, double check that the chromosome names are *exactly* the same between the reference genome, Tophat output, and GTF or GFF3 file. Mismatches can also lead to calculations being missed. http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server iGenomes did not produce a file for fruit fly, but you could request one from them. This is where they publish the data for other genomes, and there is a link to the project at the top of the page: http://cufflinks.cbcb.umd.edu/igenomes.html Good luck with your project, Jen Galaxy team
Sorry for so many questions. Thank you again for the great help.
Sincerely,
Zain
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
participants (2)
-
Jennifer Jackson
-
Zain A Alvi