Exceptionally high RPKM values of miRNA and other short genes in Cuffdiff's output
by Hoang, Thanh
Hi all,
I have been analyzing my RNA-seq data on mouse tissues. My RNA-data is
single-ended and 51 bp in length. I ran TopHat/Cufflink/Cuffdiff to test to
differential gene expression
In the Cuffdiff's output, I got very high RPKM value for some of miRNA and
some other short genes ( less than 100bp). These genes are in the top genes
with the highest RPKM. I think the RPKM values of these genes are probably
too high to be true.
*test_id* *gene_id* *gene* *locus* *sample_1* *sample_2* *status* *value_1*
*value_2* *log2(fold_change)* *test_stat* *p_value* *q_value* *significant*
*ENSMUSG00000093077* *ENSMUSG00000093077* *Mir5105* *5:146231229-146302874*
*Epithelium* *Fiber* *OK* *1.53E+06* * 445558* *-1.78097* *-355.367* *
0.00715* *0.016986* *yes* *ENSMUSG00000093098* *ENSMUSG00000093098* *
Gm22641* *7:130162450-133124354* *Epithelium* *Fiber* *OK* *87894.1* *
36474.7* *-1.26887* *-0.59863* *0.4913* *0.587174* *no* *
ENSMUSG00000089855* *ENSMUSG00000089855* *Gm15662* *10:105187662-105583874*
*Epithelium* *Fiber* *OK* *42868.9* * 21566.5* *-0.99114* *-20.7066* *0.0186
* *0.039568* *yes* *ENSMUSG00000092984* *ENSMUSG00000092984* *Mir5115* *
2:73012853-73012927* *Epithelium* *Fiber* *OK* *21104.8* * 8317.49* *
-1.34335* *-447.314* *0.0001* *0.000354* *yes* *ENSMUSG00000086324* *
ENSMUSG00000086324* *Gm15564* *16:35926510-36037131* *Epithelium* *Fiber* *
OK* *6443.35* * 3664.15* *-0.81433* *-1.52095* *0.2129* *0.301429* *no* *
ENSMUSG00000092981* *ENSMUSG00000092981* *Mir5125* *17:23803186-23824739* *
Epithelium* *Fiber* *OK* *5974.14* *2390.75* *-1.32127* *-0.34111* *0.5746*
*0.661937* *no*
I checked some forums and they said that this is the drawback of
TopHat/Cufflink/Cuffdiff when dealing with short genes. But I am still not
so clear about this. Anyone got the same problem? What can I do with this
situation?
Anyone suggests any other good tools to test for (1) differential gene
expression OR (2) both differential gene expression and gene discovery?
Thank you
Thanh
9 years
Problem with FTP
by GANDRILLON OLIVIER
Hi
I am trying to connect though FTP to transfer large files. Up to today everything went fine. But I tried many times to connect today and each time the ftp connection was instantly interrupted. Thank's for your help.
Best
Olivier
9 years
Amplicon Sequencing Questions
by Seth Stern
Hi Everyone,
I've got an amplicon that has fixed and variable sequence segments, and Im
sequencing it with 180 base MiSeq reads. I'd like to align the reads using
the fixed segment. Is this an appropriate job for either Bowtie or BWA
(using a reference sequence that is necessarily shorter than the reads) ?
If not, how should I do it ?
Thanks,
Seth
Seth Stern, Ph.D.
sethster(a)comcast.net
650 996 8726 (cell)
9 years
database associated with galaxy server
by Jennifer Jackson
Hello Mitali,
There is a rice genome available for use with most tools already on the
public Main Galaxy server at https://main.g2.bx.psu.edu
(http://usegalaxy.org). On the tool form, for the option to select a
reference genome, or on the Upload or Edit Attributes form when
selecting genome/build, type in the keyword "rice" to bring up:
Rice (Oryza sativa): oryza_sativa_japanica_nipponbare_IRGSP4.0
For genomes not included as built-in native indexes (perhaps you wish to
use a different strain), a Custom reference genome can be also used with
most tools on Main. Instructions are here:
http://wiki.galaxyproject.org/Support#Custom_reference_genome
There are a few tools on Main that do not allow for Custom reference
genomes - these have a fixed set of specified target genomes. Megablast
is one such tool. To use an alternate genome with this tool and the few
others like it, a local or cloud Galaxy is required. Instructions are here:
http://getgalaxy.org
http://usegalaxy.org/cloud
If you need to add a reference genome to a local Galaxy (a cloud Galaxy
would already include everything currently on Main), you can create the
indexes yourself, or rsync our files for use in your instance. The rsync
server also lists all genomes currently indexed on Main. Instructions
are here:
http://wiki.galaxyproject.org/Admin/Data%20Integration
http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup
Hopefully this helps,
Jen
Galaxy team
http://wiki.galaxyproject.org/Support#Mailing_Lists
On 7/17/13 11:43 PM, Mitali Merchant wrote:
> hi,
>
> i wanted to know the genome databases associated with Galaxy server. i
> want to perform peak calling using galxy server of the data of Oryza
> Sativa (rice genome) but since this genome is not available in galaxy,
> i want to know how can i get this genome as reference.
> Kindly reply as soon as possible,it will be of great help
>
> Mitali Merchant
> IIIT,Hyderabad
> India
--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org
9 years
Problem with repeated genes in Cuffdiff's output
by Hoang, Thanh
Hi all,
I am working on RNA-seq using TopHat/Cufflink/Cuffdiff for differential
gene expression and new gene discovery ( this is what I am interested in).
However, I found many genes that are repeated in the Cuffdiff's ouput.
These are the same genes and at the exact the same locus. There should be
only one gene for 1 line. Something like this:
*Genes* *Locus* *Status* *q1* *q2* *Log2 Folg change* *Significance* *
Lnp* *2:74517521-74584544* *OK* *8.91501* *85.2735* *3.25779* * yes* *Lnp
* *2:74517521-74584544* *OK* *12.0044* *171.352* *3.83533* * yes*
If I re-run the Cuffdiff for differential gene expression only ( No gene
discovery), the problem is fixed. Anyone knows how o explain and fix this?
Thank you so much
9 years
Emptying my account for good
by GANDRILLON OLIVIER
Hi
I am trying to get rid of all my previous files, in order to upload new ones. I usually use "delete permanently" for that, but this time, only used "delete". And now I am stuck: the site says I am using 69 % of my account, although my history is empty, and everything is empty, and 100 % should be available for new analysis...
I tried the "purge deleted datasets" but to no avail.
Thank's for your help
Best
Olivier Gandrillon
9 years
Error running Metagenomic Analysis
by Pavan Kumar
Hello,
I am have created the workflow for metagenomic analysis.
I am using fetch taxonomic representation tool. But it is showing me the
error like "*ERROR in line 1: Invalid ID tag" *irrespective of the data.
There are few questions raised about the same error in the blog, but the
solution is not conviencing.
Please help me in troubleshooting the above error
Thanks in advance
--
Pavan
9 years
Evaluating TopHat's results
by Hoang, Thanh
Hi,
I ran TopHat on Galaxy for my RNA-seq data. I want to analyze TopHat's
output files, such as percentage of reads mapped to the genome...but I am
not sure how to do that.
I am also trying to visualize the BAM file by IGB but the following error
message appears : " Failed to authenticate to the server".
Anyone can help with these issues?
Thank so much
Thanh
9 years
Extreme Newbie Question
by Seth Stern
Hi Everyone,
I recently uploaded about 1.8 GB of MiSeq reads (using ftp). The upload
seemed to go fine, but Galaxy seems to be very unhappy with the file. I'm
seeing the following message in History:
1: miseq_R1-03-07-2013.zip
empty
format: txt, database:
<https://main.g2.bx.psu.edu/datasets/bbd44e69cb8906b504f52516868a1f84/edit>
?
ZIP file contained more than one file, only the first file was added to
Galaxy.
Can anyone tell me how to diagnose the problem ?
Many thanks,
Seth
Seth Stern, Ph.D.
sethster(a)comcast.net
650 996 8726 (cell)
9 years
bugs of snpSiftWrapper.pl corrected
by Yufei LUO
Hi,
For those who use snpEff V3_3c release in Galaxy. There are two little
bugs in snpSiftWrapper.pl.
You can find the corrected file in the attachment.
Best Regards,
--
Yufei LUO
Bioinformatician
Insitut Gustave Roussy
Bâtiment de Médecine Moléculaire
Bureau N°136
114, rue Edouard Vaillant
94805 Villejuif Cedex - France
9 years