I have been analyzing my RNA-seq data on mouse tissues. My RNA-data is
single-ended and 51 bp in length. I ran TopHat/Cufflink/Cuffdiff to test to
differential gene expression
In the Cuffdiff's output, I got very high RPKM value for some of miRNA and
some other short genes ( less than 100bp). These genes are in the top genes
with the highest RPKM. I think the RPKM values of these genes are probably
too high to be true.
*test_id* *gene_id* *gene* *locus* *sample_1* *sample_2* *status* *value_1*
*value_2* *log2(fold_change)* *test_stat* *p_value* *q_value* *significant*
*ENSMUSG00000093077* *ENSMUSG00000093077* *Mir5105* *5:146231229-146302874*
*Epithelium* *Fiber* *OK* *1.53E+06* * 445558* *-1.78097* *-355.367* *
0.00715* *0.016986* *yes* *ENSMUSG00000093098* *ENSMUSG00000093098* *
Gm22641* *7:130162450-133124354* *Epithelium* *Fiber* *OK* *87894.1* *
36474.7* *-1.26887* *-0.59863* *0.4913* *0.587174* *no* *
ENSMUSG00000089855* *ENSMUSG00000089855* *Gm15662* *10:105187662-105583874*
*Epithelium* *Fiber* *OK* *42868.9* * 21566.5* *-0.99114* *-20.7066* *0.0186
* *0.039568* *yes* *ENSMUSG00000092984* *ENSMUSG00000092984* *Mir5115* *
2:73012853-73012927* *Epithelium* *Fiber* *OK* *21104.8* * 8317.49* *
-1.34335* *-447.314* *0.0001* *0.000354* *yes* *ENSMUSG00000086324* *
ENSMUSG00000086324* *Gm15564* *16:35926510-36037131* *Epithelium* *Fiber* *
OK* *6443.35* * 3664.15* *-0.81433* *-1.52095* *0.2129* *0.301429* *no* *
ENSMUSG00000092981* *ENSMUSG00000092981* *Mir5125* *17:23803186-23824739* *
Epithelium* *Fiber* *OK* *5974.14* *2390.75* *-1.32127* *-0.34111* *0.5746*
I checked some forums and they said that this is the drawback of
TopHat/Cufflink/Cuffdiff when dealing with short genes. But I am still not
so clear about this. Anyone got the same problem? What can I do with this
Anyone suggests any other good tools to test for (1) differential gene
expression OR (2) both differential gene expression and gene discovery?
I am trying to connect though FTP to transfer large files. Up to today everything went fine. But I tried many times to connect today and each time the ftp connection was instantly interrupted. Thank's for your help.
I've got an amplicon that has fixed and variable sequence segments, and Im
sequencing it with 180 base MiSeq reads. I'd like to align the reads using
the fixed segment. Is this an appropriate job for either Bowtie or BWA
(using a reference sequence that is necessarily shorter than the reads) ?
If not, how should I do it ?
Seth Stern, Ph.D.
650 996 8726 (cell)
There is a rice genome available for use with most tools already on the
public Main Galaxy server at https://main.g2.bx.psu.edu
(http://usegalaxy.org). On the tool form, for the option to select a
reference genome, or on the Upload or Edit Attributes form when
selecting genome/build, type in the keyword "rice" to bring up:
Rice (Oryza sativa): oryza_sativa_japanica_nipponbare_IRGSP4.0
For genomes not included as built-in native indexes (perhaps you wish to
use a different strain), a Custom reference genome can be also used with
most tools on Main. Instructions are here:
There are a few tools on Main that do not allow for Custom reference
genomes - these have a fixed set of specified target genomes. Megablast
is one such tool. To use an alternate genome with this tool and the few
others like it, a local or cloud Galaxy is required. Instructions are here:
If you need to add a reference genome to a local Galaxy (a cloud Galaxy
would already include everything currently on Main), you can create the
indexes yourself, or rsync our files for use in your instance. The rsync
server also lists all genomes currently indexed on Main. Instructions
Hopefully this helps,
On 7/17/13 11:43 PM, Mitali Merchant wrote:
> i wanted to know the genome databases associated with Galaxy server. i
> want to perform peak calling using galxy server of the data of Oryza
> Sativa (rice genome) but since this genome is not available in galaxy,
> i want to know how can i get this genome as reference.
> Kindly reply as soon as possible,it will be of great help
> Mitali Merchant
Galaxy Support and Training
I am working on RNA-seq using TopHat/Cufflink/Cuffdiff for differential
gene expression and new gene discovery ( this is what I am interested in).
However, I found many genes that are repeated in the Cuffdiff's ouput.
These are the same genes and at the exact the same locus. There should be
only one gene for 1 line. Something like this:
*Genes* *Locus* *Status* *q1* *q2* *Log2 Folg change* *Significance* *
Lnp* *2:74517521-74584544* *OK* *8.91501* *85.2735* *3.25779* * yes* *Lnp
* *2:74517521-74584544* *OK* *12.0044* *171.352* *3.83533* * yes*
If I re-run the Cuffdiff for differential gene expression only ( No gene
discovery), the problem is fixed. Anyone knows how o explain and fix this?
Thank you so much
I am trying to get rid of all my previous files, in order to upload new ones. I usually use "delete permanently" for that, but this time, only used "delete". And now I am stuck: the site says I am using 69 % of my account, although my history is empty, and everything is empty, and 100 % should be available for new analysis...
I tried the "purge deleted datasets" but to no avail.
Thank's for your help
I am have created the workflow for metagenomic analysis.
I am using fetch taxonomic representation tool. But it is showing me the
error like "*ERROR in line 1: Invalid ID tag" *irrespective of the data.
There are few questions raised about the same error in the blog, but the
solution is not conviencing.
Please help me in troubleshooting the above error
Thanks in advance
I ran TopHat on Galaxy for my RNA-seq data. I want to analyze TopHat's
output files, such as percentage of reads mapped to the genome...but I am
not sure how to do that.
I am also trying to visualize the BAM file by IGB but the following error
message appears : " Failed to authenticate to the server".
Anyone can help with these issues?
Thank so much
I recently uploaded about 1.8 GB of MiSeq reads (using ftp). The upload
seemed to go fine, but Galaxy seems to be very unhappy with the file. I'm
seeing the following message in History:
format: txt, database:
ZIP file contained more than one file, only the first file was added to
Can anyone tell me how to diagnose the problem ?
Seth Stern, Ph.D.
650 996 8726 (cell)
For those who use snpEff V3_3c release in Galaxy. There are two little
bugs in snpSiftWrapper.pl.
You can find the corrected file in the attachment.
Insitut Gustave Roussy
Bâtiment de Médecine Moléculaire
114, rue Edouard Vaillant
94805 Villejuif Cedex - France