July 2013 - galaxy-user - lists.galaxyproject.org

Exceptionally high RPKM values of miRNA and other short genes in Cuffdiff's output
by Hoang, Thanh 19 Jul '13

19 Jul '13

Hi all, I have been analyzing my RNA-seq data on mouse tissues. My RNA-data is single-ended and 51 bp in length. I ran TopHat/Cufflink/Cuffdiff to test to differential gene expression In the Cuffdiff's output, I got very high RPKM value for some of miRNA and some other short genes ( less than 100bp). These genes are in the top genes with the highest RPKM. I think the RPKM values of these genes are probably too high to be true. *test_id* *gene_id* *gene* *locus* *sample_1* *sample_2* *status* *value_1* *value_2* *log2(fold_change)* *test_stat* *p_value* *q_value* *significant* *ENSMUSG00000093077* *ENSMUSG00000093077* *Mir5105* *5:146231229-146302874* *Epithelium* *Fiber* *OK* *1.53E+06* * 445558* *-1.78097* *-355.367* * 0.00715* *0.016986* *yes* *ENSMUSG00000093098* *ENSMUSG00000093098* * Gm22641* *7:130162450-133124354* *Epithelium* *Fiber* *OK* *87894.1* * 36474.7* *-1.26887* *-0.59863* *0.4913* *0.587174* *no* * ENSMUSG00000089855* *ENSMUSG00000089855* *Gm15662* *10:105187662-105583874* *Epithelium* *Fiber* *OK* *42868.9* * 21566.5* *-0.99114* *-20.7066* *0.0186 * *0.039568* *yes* *ENSMUSG00000092984* *ENSMUSG00000092984* *Mir5115* * 2:73012853-73012927* *Epithelium* *Fiber* *OK* *21104.8* * 8317.49* * -1.34335* *-447.314* *0.0001* *0.000354* *yes* *ENSMUSG00000086324* * ENSMUSG00000086324* *Gm15564* *16:35926510-36037131* *Epithelium* *Fiber* * OK* *6443.35* * 3664.15* *-0.81433* *-1.52095* *0.2129* *0.301429* *no* * ENSMUSG00000092981* *ENSMUSG00000092981* *Mir5125* *17:23803186-23824739* * Epithelium* *Fiber* *OK* *5974.14* *2390.75* *-1.32127* *-0.34111* *0.5746* *0.661937* *no* I checked some forums and they said that this is the drawback of TopHat/Cufflink/Cuffdiff when dealing with short genes. But I am still not so clear about this. Anyone got the same problem? What can I do with this situation? Anyone suggests any other good tools to test for (1) differential gene expression OR (2) both differential gene expression and gene discovery? Thank you Thanh

3 3

Problem with FTP
by GANDRILLON OLIVIER 18 Jul '13

18 Jul '13

Hi I am trying to connect though FTP to transfer large files. Up to today everything went fine. But I tried many times to connect today and each time the ftp connection was instantly interrupted. Thank's for your help. Best Olivier

2 1

Amplicon Sequencing Questions
by Seth Stern 18 Jul '13

18 Jul '13

Hi Everyone, I've got an amplicon that has fixed and variable sequence segments, and Im sequencing it with 180 base MiSeq reads. I'd like to align the reads using the fixed segment. Is this an appropriate job for either Bowtie or BWA (using a reference sequence that is necessarily shorter than the reads) ? If not, how should I do it ? Thanks, Seth Seth Stern, Ph.D. sethster(a)comcast.net 650 996 8726 (cell)

2 1

database associated with galaxy server
by Jennifer Jackson 18 Jul '13

18 Jul '13

Hello Mitali, There is a rice genome available for use with most tools already on the public Main Galaxy server at https://main.g2.bx.psu.edu (http://usegalaxy.org) On the tool form, for the option to select a reference genome, or on the Upload or Edit Attributes form when selecting genome/build, type in the keyword "rice" to bring up: Rice (Oryza sativa): oryza_sativa_japanica_nipponbare_IRGSP4.0 For genomes not included as built-in native indexes (perhaps you wish to use a different strain), a Custom reference genome can be also used with most tools on Main. Instructions are here: http://wiki.galaxyproject.org/Support#Custom_reference_genome There are a few tools on Main that do not allow for Custom reference genomes - these have a fixed set of specified target genomes. Megablast is one such tool. To use an alternate genome with this tool and the few others like it, a local or cloud Galaxy is required. Instructions are here: http://getgalaxy.org http://usegalaxy.org/cloud If you need to add a reference genome to a local Galaxy (a cloud Galaxy would already include everything currently on Main), you can create the indexes yourself, or rsync our files for use in your instance. The rsync server also lists all genomes currently indexed on Main. Instructions are here: http://wiki.galaxyproject.org/Admin/Data%20Integration http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup Hopefully this helps, Jen Galaxy team http://wiki.galaxyproject.org/Support#Mailing_Lists On 7/17/13 11:43 PM, Mitali Merchant wrote: > hi, > > i wanted to know the genome databases associated with Galaxy server. i > want to perform peak calling using galxy server of the data of Oryza > Sativa (rice genome) but since this genome is not available in galaxy, > i want to know how can i get this genome as reference. > Kindly reply as soon as possible,it will be of great help > > Mitali Merchant > IIIT,Hyderabad > India -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org

1 0

Problem with repeated genes in Cuffdiff's output
by Hoang, Thanh 17 Jul '13

17 Jul '13

Hi all, I am working on RNA-seq using TopHat/Cufflink/Cuffdiff for differential gene expression and new gene discovery ( this is what I am interested in). However, I found many genes that are repeated in the Cuffdiff's ouput. These are the same genes and at the exact the same locus. There should be only one gene for 1 line. Something like this: *Genes* *Locus* *Status* *q1* *q2* *Log2 Folg change* *Significance* * Lnp* *2:74517521-74584544* *OK* *8.91501* *85.2735* *3.25779* * yes* *Lnp * *2:74517521-74584544* *OK* *12.0044* *171.352* *3.83533* * yes* If I re-run the Cuffdiff for differential gene expression only ( No gene discovery), the problem is fixed. Anyone knows how o explain and fix this? Thank you so much

2 1

Emptying my account for good
by GANDRILLON OLIVIER 16 Jul '13

16 Jul '13

Hi I am trying to get rid of all my previous files, in order to upload new ones. I usually use "delete permanently" for that, but this time, only used "delete". And now I am stuck: the site says I am using 69 % of my account, although my history is empty, and everything is empty, and 100 % should be available for new analysis... I tried the "purge deleted datasets" but to no avail. Thank's for your help Best Olivier Gandrillon

2 1

Error running Metagenomic Analysis
by Pavan Kumar 15 Jul '13

15 Jul '13

Hello, I am have created the workflow for metagenomic analysis. I am using fetch taxonomic representation tool. But it is showing me the error like "*ERROR in line 1: Invalid ID tag" *irrespective of the data. There are few questions raised about the same error in the blog, but the solution is not conviencing. Please help me in troubleshooting the above error Thanks in advance -- Pavan

2 1

Evaluating TopHat's results
by Hoang, Thanh 15 Jul '13

15 Jul '13

Hi, I ran TopHat on Galaxy for my RNA-seq data. I want to analyze TopHat's output files, such as percentage of reads mapped to the genome...but I am not sure how to do that. I am also trying to visualize the BAM file by IGB but the following error message appears : " Failed to authenticate to the server". Anyone can help with these issues? Thank so much Thanh

2 2

Extreme Newbie Question
by Seth Stern 15 Jul '13

15 Jul '13

Hi Everyone, I recently uploaded about 1.8 GB of MiSeq reads (using ftp). The upload seemed to go fine, but Galaxy seems to be very unhappy with the file. I'm seeing the following message in History: 1: miseq_R1-03-07-2013.zip empty format: txt, database: <https://main.g2.bx.psu.edu/datasets/bbd44e69cb8906b504f52516868a1f84/edit> ? ZIP file contained more than one file, only the first file was added to Galaxy. Can anyone tell me how to diagnose the problem ? Many thanks, Seth Seth Stern, Ph.D. sethster(a)comcast.net 650 996 8726 (cell)

2 1

bugs of snpSiftWrapper.pl corrected
by Yufei LUO 15 Jul '13

15 Jul '13

Hi, For those who use snpEff V3_3c release in Galaxy. There are two little bugs in snpSiftWrapper.pl. You can find the corrected file in the attachment. Best Regards, -- Yufei LUO Bioinformatician Insitut Gustave Roussy Bâtiment de Médecine Moléculaire Bureau N°136 114, rue Edouard Vaillant 94805 Villejuif Cedex - France

1 0