Hi Zain,
On 5/19/13 1:35 PM, Zain A Alvi wrote:
Hi Jen,
Thank you for the information regarding the FastQ information.
It was really helpful.
Lately, I have been getting the following error: "Error getting
history update from this server- Bad Gateway". This occurred
after I tried to reupload some pre-aligned/ and indexed BAM
files from NCBI GEO because I was hoping to generate and
retrieve FPKM/RPKM values from them.
This has now been resolved, very sorry for the confusion it caused.
Unfortunately, the my old files are still not available on
Galaxy and I get an Internal Server Error when trying to
retrieve them. Although I can get the work flow for them.
Same, resolved now.
The last weird error is that when I use Cuffdiff, I get FPKM of
0 with p/q values of 1 all the time. When this should not be the
case as the BAM files are from two different organs. This is for
every single gene, hence this indicates that something is wrong.
I was able to retrieve the GTF file from UCSC main with the
following settings:
Insect - D. pseuddobscura
Group - Genes and Gene Prediction Tracks
Track: Flybase
Table FlybaseGene
Output format: GTF.
I was wondering should these setting be fine or should I change
the Group to mRNA or some other settings. Although the one that
is avilable on UCSC is old dp3 file from 2004. The latest GFF is
3.1 on Flybase. I was wondering anyway to convert to a GTF file.
I can't recommend a conversion tool, but there are a few on the web
that could be tested out, if you decide to go that route. I do know
that certain GFF3 files directly from FLYBASE have been problematic
with the RNA-seq tools due to duplicated "ID" attributes. I don't
know if this is all versions or not, or just the dm3 version. That
said, the issue has been isolated to a few records (a gene mapping
to >1 location), and there isn't any reason why you shouldn't
test out the D. pseuddobscura version and then adjust it, if
needed.
The GTF file from the UCSC Table browser is correct, but Cuffdiff is
looking for attributes that this version of the file does not have.
If you look at the 9th field of the file to examine these attributes
and compare it to the Cuffdiff input documentation, you can see how
these differ. The
gene_id and
transcript_id are the same value and other attributes are not
present such as
tss_id and
p_id. There is nothing wrong with the file, but without these
attributes populated a particular way, certain calculations will not
be done.
http://cufflinks.cbcb.umd.edu/manual.html
These variations are just different projects following a slightly
different file specification. Some are content variations, some are
format variations. This is common with this file type family (GFF,
GTF, GFF3). This is why iGenomes creates files specifically for
certain genomes for use with this tool set.
When you do obtain a file that has the format and content you want
to use, double check that the chromosome names are *exactly* the
same between the reference genome, Tophat output, and GTF or GFF3
file. Mismatches can also lead to calculations being missed.
http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server
iGenomes did not produce a file for fruit fly, but you could request
one from them. This is where they publish the data for other
genomes, and there is a link to the project at the top of the page:
http://cufflinks.cbcb.umd.edu/igenomes.html
Good luck with your project,
Jen
Galaxy team
Sorry for so many questions. Thank you again for the great help.
Sincerely,
Zain
--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org