August 2011 - galaxy-user - lists.galaxyproject.org

Re: [galaxy-user] question about the GATK tools
by Daniel Blankenberg 18 Aug '11

18 Aug '11

Hi Xiaojing, Data for hg17 is not yet available for this toolset. You may still be able to use this tool if you upload your own FASTA genome file. Please be sure to respect the GATK genome ordering rules and reorder your BAM file if necessary: http://www.broadinstitute.org/gsa/wiki/index.php/Preparing_the_essential_GA… However, because these tools are still under development/experimental we recommended that you remap your sequences against the "hg_g1k_v37" genome if you want to use these tools at this time. Also, please keep all replies on the mailing list. Thanks for using Galaxy, Dan On Aug 18, 2011, at 1:33 PM, Hong, Xiaojing wrote: > Hi, Dan > > I did set the genome build to HG17 but I still can’t see the select box. > > Xiaojing Hong > Ph.D candidate > Department of Biology > Manak Lab > 455 Biology Building > University of Iowa > Iowa City, IA 52242 > (P) 319-335-0266 > > From: Daniel Blankenberg [mailto:dan@bx.psu.edu] > Sent: Thursday, August 18, 2011 12:19 PM > To: Hong, Xiaojing > Cc: galaxy-user(a)lists.bx.psu.edu > Subject: Re: [galaxy-user] question about the GATK tools > > Hi Xiaojing, > > You'll need to make sure that the dbkey of your input BAM file is set to a genome build that has data available. Click the pencil icon to set the genome build. If you have set the genome build, but still have an empty select box then data may not be available for this build/tool combination, you can request that the needed data be added for this tool. If your reference genome is not available, you can also upload a FASTA file containing the genome and access it directly from the history. > > > Thanks for using Galaxy, > > Dan > > > On Aug 17, 2011, at 11:26 AM, Hong, Xiaojing wrote: > > > Hi, > > I just uploaded the BAM file for an exome sequencing sample and was trying to use the GATK tools. In the first step, realigner Target creator, I can see my uploaded file but I can’t see any options under the “using reference genome” and the following choices so I can’t click execute. Did I do anything wrong? Thanks! > > Xiaojing Hong > University of Iowa > > > > ___________________________________________________________ > The Galaxy User list should be used for the discussion of > Galaxy analysis and other features on the public server > at usegalaxy.org. Please keep all replies on the list by > using "reply all" in your mail client. For discussion of > local Galaxy instances and the Galaxy source code, please > use the Galaxy Development list: > > http://lists.bx.psu.edu/listinfo/galaxy-dev > > To manage your subscriptions to this and other Galaxy lists, > please use the interface at: > > http://lists.bx.psu.edu/ >

1 0

question about the GATK tools
by Hong, Xiaojing 18 Aug '11

18 Aug '11

Hi, I just uploaded the BAM file for an exome sequencing sample and was trying to use the GATK tools. In the first step, realigner Target creator, I can see my uploaded file but I can't see any options under the "using reference genome" and the following choices so I can't click execute. Did I do anything wrong? Thanks! Xiaojing Hong University of Iowa

2 1

Galaxy and Xgrid
by Paul Cantalupo 18 Aug '11

18 Aug '11

Hello, I'm trying to determine if Galaxy will work with Xgrid to set up a small cluster of Mac's for next-gen sequencing projects. I saw onhttp://wiki.g2.bx.psu.edu/Admin/Config/Performance/Production%20Serverthat Galaxy supports Torque PBS or Sun Grid. I don't think that Sun Grid runs on Snow Leopard and I don't know about Torque. Snow Leopard already comes with Xgrid so I was hoping to use it. Thank you for your help, Paul Cantalupo University of Pittsburgh

1 0

Re: [galaxy-user] Cufflinks with reference annotation and without reference annotation
by Jeremy Goecks 17 Aug '11

17 Aug '11

Crystal, If you provide a gene annotation to Cufflinks, the transcripts produced will match those in the annotation exactly. If you assemble without a gene annotation, the transcripts produced will match the reference in some cases, but, in others, will not match the reference due to small and/or large errors. Because '=' denotes an exact match between an assembled transcript and a reference transcript, more '=' are to be expected when Cufflinks has a gene annotation. Finally, a couple procedural issues: *please send questions about analyses and tool usage to the galaxy- user mailing list, not galaxy-dev or individual developers; *please do not send duplicate emails as it can confuse our tracking system and slow down our response rather than speed it up. Good luck, J. On Aug 17, 2011, at 9:14 AM, Crystal Goh wrote: > Hi, I am Crystal. I have some problem with Cuffdiff output. Hope can > get some advice. Thanks. > > > After aligning RNA-seq reads with Tophat, I used the Tophat output > for Cufflinks. > > For Cufflinks, I tried two approaches and compared the results: > 1st approach: Put zebrafish Ensembl GTF as reference annotation > 2nd approach: without reference annotation. > > > From the output of above 2 approaches, I continued with Cuffcompare > (with reference annotation) and Cuffdiff, > Attached word document is the workflow and parameters I set for > these 2 approaches. > > > When I compared the output of Cuffdiff between these 2 approaches, a > total of 48584 tracking id with class code "=" was observed in > trancript FPKM tracking file from Approach 1, whereas there is only > 1248 tracking id with class code '=' from Approach 2 (I attached > transcript FPKM tracking files from approach 1 and 2) > > In my opinion, I should observe 48584 tracking id with class code > '=' and additional tracking id with other class codes in transcript > FPKM tracking file from Approach 2. > > Can I get advice on this? > > > Thank you. > > Best regards, > Crystal > <Workflow and parameter for 2 approaches.zip><Approach 1 Transcript > FPKM tracking (Cufflinks with reference annotation).zip><Approach 2 > Transcript FPKM tracking (Cufflinks without reference > annotation).zip>

2 1

New software needed
by Yan He 17 Aug '11

17 Aug '11

Hi Jenn, I am working on RAD-sequencing. Two available softwares for analyzing RAD tags are Stacks and RADtools. I wonder whether it is possible to integrate these two softwares to Galaxy. Thanks! Yan

2 1

bowtie/bwa not running
by Keith E. Giles 16 Aug '11

16 Aug '11

I tried to start an alignment with bowtie and bwa last night, and they are still "waiting to run". Is there a server issue?

2 1

problems running bowtie and bwa
by Keith E. Giles 16 Aug '11

16 Aug '11

I am trying to map fastq files using bowtie and bwa and the job will not run. It has remained gray for almost 24 hours now. I am able to do other operations. I tried aligning to both a genome from history and hg18, same result.

1 0

unable to visualize peak data
by Radhouane Aniba 15 Aug '11

15 Aug '11

Hi everyone, I have a bed file with reads mapped on the genome, I filtered it to have just chr1, once I got the filtered file, I run MACS on it to calculate peaks (btw it is a histone modification mapped reads) and the file was generated successfully when I tried to view it in UCSC genome browser I got an error message * Error 500: Internal Server Error** Is this related to galaxy or UCSC GB ? I tried the experiment 4 times and it is always the case Cheers Rad * -- *R. Aniba* *Bioinformatics Postdoctoral Research Scientist* *Institute for Advanced Computer Studies Center for Bioinformatics and Computational Biology* *(CBCB)* *University of Maryland, College Park MD 20742*

2 2

Uploading bulk data to user accounts
by Paul-Michael Agapow 15 Aug '11

15 Aug '11

So, I'm trying to setup a smooth pathway for our users to import their NGS data into Galaxy. I'd hoped for a solution that would require zero interaction with an administrator, but things seem more awkward than they should be: 1. We can automagically push produced NGS data into a user import directory on the host server. No problem. 2. The user can upload the data into via "Shared Data / Data Libraries / <name of data library> / Add datasets / Upload directory of files" 3. _But_ this data library must be created by an administrator, who must then assign appropriate permissions (access and add) for the appropriate user. Have I got this right? I suppose that the last step only needs to be done once (create an "import library" for every users that I expect to want to import), but it seems a little fiddlier than expected. Is there an easier way, have I missed something? ---- Paul Agapow (paul-michael.agapow(a)hpa.org.uk) Bioinformatics, Centre for Infections, Health Protection Agency ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk **************************************************************************

2 1

Any thing wrong with my cufflink process in galaxy?
by yao chen 14 Aug '11

14 Aug '11

Dear all: Recently, I run cufflink in galaxy on the internet. I want to compare two samples, However, I found no transcript or gene passed the significant level, even many of them have large FPKM in one sample and 0 FPKM in another sample. Any thoughts? Below is my cufflink process: I have four samples belong to two group. the test have three samples, and the control has one sample. First, using accept_hit.bam from tophat, I run cufflink without annotation on each sample. Then, for the four "gtf" files from four samples, I run cuffcompare to combine these transcript and compare to the annotation genome. However, at this step, I found the transcript accuracy is very low. See one example: Missed exons: 10673/11776 ( 90.6%) Wrong exons: 1254/2007 ( 62.5%) Missed introns: 8529/8637 ( 98.7%) Wrong introns: 2/5 ( 40.0%) Missed loci: 0/504 ( 0.0%) Wrong loci: 1248/2002 ( 62.3%) at last, I run cufdiff between this two group sample. Thank you.

3 2