December 2010 - galaxy-user - lists.galaxyproject.org

Re: [galaxy-user] Question about using replicate samples and Refrence Ensembl
by Jeremy Goecks 23 Dec '10

23 Dec '10

Hi Vasu, Please cc the galaxy-user email list so that everyone can benefit from this discussion and so that it is archived. On to your questions: >> OK I have two replicates of sample. My question is when I have to run Cuffcompare or cuffdiff I have to use them as single file in RNA seq analysis. Either I can combine bam files or combine before running Tophat. What is suggested. The answer depends on what you're looking for. There are two options: (1) If you're looking for differences between the two samples--e.g. the samples came from two different tissues or from two different time periods in the same tissue--you should run each sample through Tophat & Cufflinks and then run Cuffcompare and Cuffdiff on the GTF files generated by Cufflinks. (2) If you're looking for differences within the two samples--e.g. the samples are two lanes of sequencing data from the same biological replicate--then you should combine all your reads before running Tophat-->Cufflinks-->Cuffcompare. >> >> Secondly I have downloaded Ensembl file as suggested by You but than how to make it happen that when I do the analysis Cufflink or cuffcompare read this file. The Ensemble gene annotation GTF should be used as the "reference" for Cuffcompare; optionally, you can also use the GTF as a reference for Cufflinks as well. Best, J.

1 0

Question about using replicate samples and Refrence Ensembl
by vasu punj 22 Dec '10

22 Dec '10

I am running RNA seq and while I have to run Tophat how can I include replicates of samples than secondly I need to include Ensembl reference database for getting IDs in Cufflinker I don’t see any option in Galaxy? Any suggestion please.

2 1

NGS: Copy Number Modules?
by Richard Park 17 Dec '10

17 Dec '10

Hi Everyone, I'm new to Galaxy and was just wondering if there are any modules for copy number analysis for NGS data? Thanks, Richard Park

2 1

Upload problem
by Asuncion Lago 17 Dec '10

17 Dec '10

Hi, I am trying to upload three files containing illumina data I would like to analyze using Galaxy. I spent more than two days trying it. After 24h I decided to upload one by one but the result was the same, the file it is still uploading since 23h ago. The size of the fastq file are 1.1GB, 1.5GB and 1.6GB. Is it normal that it takes so much time? Is there another way to upload the files? Any suggestion? Thanks in advance, Asun

2 1

SRA files on Galaxy
by David Coil 17 Dec '10

17 Dec '10

Greetings Galaxy team, Are there any plans for Galaxy to be able to deal with SRA files? NCBI no longer produces fastq dumps of data, everything is now maintained in SRA format. This means that in order to use the data on Galaxy, one must download the SRA, convert it to fastq and then upload this new, much larger file to Galaxy. The conversion tool seems fairly simple, would it be possible to incorporate into Galaxy such that the SRA files could be sent over directly from NCBI? Thanks for all the great work! David Coil, UC Davis Genome Center

1 0

bed file visualization with trackster
by Stewart Noyce 17 Dec '10

17 Dec '10

Attempted to demo trackster visualization on the main site (main.g2.bx.psu.edu) today. - Logged in as required. - Used the _USCS Main_ table browser to download a coverage file in the bed format for hg18. - Viewed the first MB of four column data (chrom, chromStart, chromEnd, name), which clearly contains annotation data for 'chr1'. - Clicked on the _visualization in Trackster_ glyph. - Created a Human browser with a Dbkey of hg18. - Added the downloaded file as a track. - Selected chr1 and nothing shows up in the visualization screen. - Clicked through all magnification levels and moved left to right through the chromosome view. Am I missing something, or is this a bug? Stewart

2 1

Galaxy Roadshow | San Diego, CA | Jan 2011
by Anton Nekrutenko 17 Dec '10

17 Dec '10

Dear Galaxy Users and Developers: Dan Blankenberg (dan(a)bx.psu.edu) from the Galaxy Team will be in San Diego between 14 and 20 of January. If anyone in SD area runs local Galaxy installs and would like a bit face-to-face time with Dan (he is one of the senior developers on the project) e-mail him directly. Thanks! anton Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org

1 0

Problem with stale pages - proxy issue?
by Peter 17 Dec '10

17 Dec '10

Hi all, I've been using Galaxy happily from both Mac and Linux machines, both on http://usegalaxy.org and our local server. Today I wanted to give a demo so I tried using Galaxy on a few Windows machines (problems with IE6 cropped up). However, I found a general problem (also affecting Firefox and Chrome) with stale pages... e.g. The history on the right hand column often required a manual refresh (via the blue icon top left of the history column) after running a tool. Also, and more confusingly, on logging in or logging out, the user menu did not automatically get refreshed, nor did the workflow page - it would show me workflows when logged out, or say I was logged out when I was logged in. In all these cases a hard refresh (shift/control + F5) made things right. My *guess* is that our Windows machines are all going via an institute proxy, while the Mac and Linux machines are not. Has anyone experienced anything similar, and could offer any tips for what to check? Thanks, Peter

1 1

Setting up a local Galaxy instance
by Weiner, Michael 16 Dec '10

16 Dec '10

I have been asked by a research fellow to setup a local instance of Galaxy. In addition, he would like the latest versions of packages like bowtie, cufflinks, tophat, etc to be installed and available through that instance. Being a newbie to Galaxy, and just a systems administrator responsible for building this environment, I am unsure how exactly to go about doing this. I have an instance running, and worked my way through the fastx-toolkit installation/update but I cannot seem to find a similar how-to for the other packages. Could someone point me in the right direction please? Thank you in advance Michael Weiner UNIX Systems Administrator Lerner Research Institute Cleveland Clinic =================================== P Please consider the environment before printing this e-mail Cleveland Clinic is ranked one of the top hospitals in America by U.S.News & World Report (2009). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use only by the individual or entity to which it is addressed and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. Thank you.

2 2

Fwd: Cufflinks
by David Matthews 16 Dec '10

16 Dec '10

Hi Jeremy, Just got this from Cole - the -s option is needed for sure and Adam also thinks that the version needs to be upgraded to the latest one - sorry to add more work to your plate!! Glad to get to the bottom of it at last - perhaps a post in Seqanswers would be worthwhile... David Begin forwarded message: > From: Cole Trapnell <cole(a)cs.umd.edu> > Date: 8 December 2010 18:19:53 GMT > To: David Matthews <D.A.Matthews(a)bristol.ac.uk> > Cc: Adam Roberts <adarob(a)gmail.com> > Subject: Re: Cufflinks > > Ah. Yes, we really need to add a note in the manual about this. > > Getting the p_ids to work requires both that your GTF file have annotated "CDS" type records (not just "exon" type records like the Cufflinks assembler spits out), and that you also supply a reference genome sequence with -s. David, is that what you're doing? > > C > > > On Dec 8, 2010, at 1:15 PM, David Matthews wrote: > >> Hi Adam (and Cole), >> >> Thanks for the email, I certainly appreciate your situation and I imagine the popularity of the cufflinks suite means you get a lot of emails on that as well! The version they are using is 0.9.1 but the p_id problem seems to be a theme in seqanswers which is where the suggestion to use the -s option came from. Cole do you have any thoughts on the lack of p_ids and cds files? The galaxy team plan to introduce the -s option asap... >> >> Thanks again for your patience and input. >> >> Cheers >> David >> >> >> On 8 Dec 2010, at 18:08, Adam Roberts wrote: >> >>> David, >>> >>> Sorry I haven't been responsive lately. I've definitely been swamped: writing a paper, finishing class projects, and planning a course I'm TAing next semester -- All of which have to be done before I go on vacation on Monday. Obviously, this means that I might not get to all of your issues for a couple of weeks, but I will try my best. >>> >>> The Cuffdiff thing definitely looks like it may be a bug. Is Galaxy using 0.9.3? If not, then it is probably the bug in 0.9.2 that we fixed with that release. Concerning the p_id issue, I honestly don't know much about these details of cuffcompare and cuffdiff, as my work has primarily been in the abundance estimation code. I suggest you email Cole about that, and let me know if he doesn't respond within a few days. >>> >>> -Adam >>> >>> On Wed, Dec 8, 2010 at 7:43 AM, David Matthews <D.A.Matthews(a)bristol.ac.uk> wrote: >>> Hi again Adam, >>> >>> Just been running a long email chat with the team at Galaxy and they too cannot get p_ids attached and their analysis shows no data in the cds files from cuffdiff - any news on that? Some people are suggesting the -s option must be run on cuffcompare - does this make sense to you? >>> >>> Cheers >>> David >>> >>> >>> >>> >>> >>> >>> On 3 Dec 2010, at 16:34, Adam Roberts wrote: >>> >>>> No need to apologize. We really want Cufflinks to be useful and not just a method that we write papers about. >>>> >>>> This problem with cuffdiff you mention definitely should not be happening, but before I go looking for a bug, I want to be sure that it really is happening. Are you sure you are looking at the matching sample? Also, can you give me a few lines from the output that show the different results? >>>> >>>> Thanks. >>>> >>>> -Adam >>>> >>>> On Fri, Dec 3, 2010 at 3:32 AM, David Matthews <D.A.Matthews(a)bristol.ac.uk> wrote: >>>> Thanks Adam, that makes more sense to me now. A second question that has popped up is to do with analysing a 3 point time course. In Galaxy you can only get cuffdiff to look at two time points at once. But if I look at the output there is an oddity, the fpkms for a gene at the middle time point are not always the same in both analysis. For example, if I look at a gene called RAD50 at times 0h, 8h and 24h and compare 0 and 8 then compare 8 and 24 the fpkm for this gene at 8 hours should be the same in both analysis but this is not always the case. Any thoughts on this? >>>> >>>> P.S. I know it sounds like I'm moaning but I am really impressed with the whole suite of programs and especially now its on Galaxy. Previously I was doing it all on my iMac and it struggled to do anything else sometimes! What this suite of programs offers is far superior to anything else anyone else is offering (especially the people at Sanger here in the UK who don't seem to be very interested in helping biologists like me get to grips with this kind of stuff!). >>>> >>>> >>>> >>>> >>>> >>>> On 3 Dec 2010, at 04:07, Adam Roberts wrote: >>>> >>>>> Also, cuffdiff has code in it that automatically sums these. We have not implemented such code in cufflinks at this point, but we may do so in the future. >>>>> The FAIL means that the likelihood maximization algorithm did not converge on an isoform deconvolution. You can try setting -f 0.1, which will filter out isoforms with less than 10% of the expression for the gene and may help the calculation to converge. >>>>> >>>>> -Adam >>>>> >>>>> On Thu, Dec 2, 2010 at 8:05 PM, Adam Roberts <adarob(a)gmail.com> wrote: >>>>> Cufflinks creates XLOCs based on sets of overlapping transcripts. Sometimes, a gene will have transcripts that don't share any exons, and will therefore be bundled into separate XLOCs. I'm assuming that is what is happening here. If you only care about the total for the whole gene, you can just take the sum. >>>>> >>>>> -Adam >>>>> >>>>> >>>>> On Thu, Dec 2, 2010 at 3:51 PM, David Matthews <D.A.Matthews(a)bristol.ac.uk> wrote: >>>>> Hi, >>>>> >>>>> Thanks for emailing me back about this. Here is a good example of a gene reported many times by cuffdiff in the "genes FPKM tracking file": >>>>> >>>>> XLOC_008048 - - ABR - chr17:906640-1012618 20.5239 19.4454 21.6025 20.49 19.4152 21.5649 >>>>> XLOC_008049 - - ABR - chr17:906640-1012618 8.65474 7.90632 9.40316 5.3011 4.84316 5.75905 >>>>> XLOC_008050 - - ABR - chr17:906640-1012618 31.345 29.7825 32.9074 24.4523 23.242 25.6627 >>>>> XLOC_008051 - - ABR - chr17:906640-1012618 87.0488 82.0643 92.0333 170.051 160.486 179.615 >>>>> XLOC_008052 - - ABR - chr17:906640-1012618 4.60255 4.27034 4.93476 0 0 0 >>>>> XLOC_008053 - - ABR - chr17:906640-1012618 39.4463 37.6705 41.2221 59.5061 56.8463 62.166 >>>>> XLOC_008054 - - ABR - chr17:906640-1012618 40.3393 38.0356 42.6429 16.5153 15.586 17.4446 >>>>> XLOC_008055 - - ABR - chr17:906640-1012618 3.07796 2.88305 3.27286 5.12889 4.80319 5.45459 >>>>> XLOC_008056 - - ABR TSS5913 chr17:906640-1012618 7.7536 7.3448 8.1624 7.78601 7.37482 8.1972 >>>>> XLOC_008057 - - ABR - chr17:906640-1012618 37.9869 35.9915 39.9823 19.9239 18.8859 20.9618 >>>>> XLOC_008058 - - ABR - chr17:906640-1012618 16.0369 14.9386 17.1352 7.83102 7.29567 8.36637 >>>>> XLOC_008059 - - ABR - chr17:906640-1012618 53.3928 51.194 55.5917 20.1807 19.3588 21.0025 >>>>> XLOC_008060 - - ABR - chr17:906640-1012618 17.99 16.9108 19.0692 19.626 18.4516 20.8004 >>>>> XLOC_009397 - - ABR TSS6816,TSS6817 chr17:906640-1012618 14.2054 8.00403 20.4068 12.0799 6.83849 17.3213 >>>>> >>>>> As you can see the XLOC is different, but the gene name and the chromosome locations are all the same. I have used the "group data" tool on the Galaxy website to sum the FPKMs for every gene with the same gene name to get a single set of numbers for a given gene, but it seems to me that this file should have done it already or maybe I'm missunderstanding something. >>>>> >>>>> Thanks again! >>>>> >>>>> Best Wishes, >>>>> David >>>>> >>>>> P.S. I'm using the latest ensembl.gtf file when I run tophat and when I run cuffcompare to generate the combined gtf file. In both cases I do NOT restrict the program to annotated genes only. >>>>> >>>>> >>>>> >>>>> On 2 Dec 2010, at 23:14, Adam Roberts wrote: >>>>> >>>>>> Can you send me some example lines from the files? >>>>>> >>>>>> On Wed, Dec 1, 2010 at 7:10 AM, David Matthews <D.A.Matthews(a)bristol.ac.uk> wrote: >>>>>> Dear Adam, >>>>>> >>>>>> I am using cufflinks to look at mRNA expression in virus infected cells. I am using cufflinks on the Galaxy server. When I run through the normal workflow, the files produced by cuffdiff do not seem to be quite right. The main issue is that both the gene expression files and the genes fpkm files contain repeat sets of numbers for the same gene - as I understand it those files should contain a summed value for everything assigned to that gene name - am I misunderstanding the problem? Am I doing something wrong? I use the ensemble gtf file during cuffcompare and use the combined gtf files for the cuffdiff along with the relevant sam files. If, on the other hand I run cuffdiff with the same files and the ensembl gtf file I get a single number for each gene (although strangely many genes fail to give any expression data and are reported as "FAIL"). >>>>>> >>>>>> What am I doing wrong? >>>>>> >>>>>> Hope you can help, >>>>>> >>>>>> Cheers >>>>>> David >>>>>> >>>>>> >>>>>> __________________________________ >>>>>> Dr David A. Matthews >>>>>> >>>>>> Senior Lecturer in Virology >>>>>> Room E49 >>>>>> Department of Cellular and Molecular Medicine, >>>>>> School of Medical Sciences >>>>>> University Walk, >>>>>> University of Bristol >>>>>> Bristol. >>>>>> BS8 1TD >>>>>> U.K. >>>>>> >>>>>> Tel. +44 117 3312058 >>>>>> >>>>>> D.A.Matthews(a)bristol.ac.uk >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >

2 2