Thanks Ariel. Bony
-----Original Message----- From: galaxy-user-bounces@lists.bx.psu.edu [mailto:galaxy-user-bounces@lists.bx.psu.edu] On Behalf Of galaxy-user-request@lists.bx.psu.edu Sent: Thursday, August 16, 2012 11:00 AM To: galaxy-user@lists.bx.psu.edu Subject: galaxy-user Digest, Vol 74, Issue 15
Send galaxy-user mailing list submissions to galaxy-user@lists.bx.psu.edu
To subscribe or unsubscribe via the World Wide Web, visit http://lists.bx.psu.edu/listinfo/galaxy-user or, via email, send a message with subject or body 'help' to galaxy-user-request@lists.bx.psu.edu
You can reach the person managing the list at galaxy-user-owner@lists.bx.psu.edu
When replying, please edit your Subject line so it is more specific than "Re: Contents of galaxy-user digest..."
HEY! This is important! If you reply to a thread in a digest, please 1. Change the subject of your response from "Galaxy-user Digest Vol ..." to the original subject for the thread. 2. Strip out everything else in the digest that is not part of the thread you are responding to.
Why? 1. This will keep the subject meaningful. People will have some idea from the subject line if they should read it or not. 2. Not doing this greatly increases the number of emails that match search queries, but that aren't actually informative.
Today's Topics:
1. Re: Lift Over bam files (Jennifer Jackson) 2. Linking to Compressed Data (Branden Timm) 3. Re: How to decide "Mean Inner Distance between Mate Pairs"? (Jennifer Jackson) 4. Can I convert paired-end datasets into single end ones? (Du, Jianguang) 5. Re: Can I convert paired-end datasets into single end ones? (Jennifer Jackson) 6. Re: Galaxy toolshed-vcftools (Jennifer Jackson) 7. Do I need to allow indel search? (Du, Jianguang) 8. Use Own Junctions or not (Du, Jianguang) 9. Re: copy number variation detcetion in Glaxay (Jennifer Jackson) 10. Cuffdiff errors (Yan He) 11. Re: Cuffdiff errors (Jennifer Jackson) 12. Re: copy number variation detcetion in Glaxay (Mathew Bunj) 13. Re: Do I need to allow indel search? (Jennifer Jackson)
----------------------------------------------------------------------
Message: 1 Date: Wed, 15 Aug 2012 09:05:41 -0700 From: Jennifer Jackson jen@bx.psu.edu To: Geert Vandeweyer geertvandeweyer@gmail.com Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Lift Over bam files Message-ID: 502BC8D5.8020804@bx.psu.edu Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hello Geert,
For the best results, especially for SNPs, you will want to map directly to the target genome. The genome Galaxy is using is the same primary human genome the GATK team also uses - the 1000 genomes build 37 -> "hg_g1k_v37". Click on the GATK links from one of the tools to see the details. GATK provides liftOver files between the the genomes, and you could install and use these with the liftOver tool, but not for BAM datasets. Inputs are BED, Interval, GFF. (BAM -> SAM -> interval).
GATK also provides indexes (lifted) for hg19, but Galaxy does not provide an hg19 genome that is sorted appropriately for GATK, or at least not yet. RNA-seq tools and most other tools up until now required sorting in one way, and now GATK requires sorting in another, but keeping the database dbkey the same is important for visualization and other functions. It can get complicated when moving between tools in a history. We will likely have some 'best practice' solutions soon, but for now, use the 1000 genomes build to keep it all simple:
Human (Homo sapients) (b37): hg_g1k_v37
The good news is that installing this genome has been greatly simplified. The genome and indexes are now available on an rsync server. You can simply download and add the genome directory and all the contents. You will still need to create the .loc file entries but the rest is done. http://wiki.g2.bx.psu.edu/Admin/Data%20Integration
The "dbkey" is "hg_g1k_v37"
Hopefully one of the options works out for you!
Jen Galaxy team
ps: You post ended up threading behind another post. I am not sure if this was because you started with a reply, but changed the subject line? This is not enough to start a new thread. Instead, please create a brand new message in your email client, then copy over the mailing list email address, add a subject line, and this will start a new thread that will get tracked and not missed. Thanks!
On 8/15/12 1:16 AM, Geert Vandeweyer wrote:
Hi all,
We have recieved some bam files that were aligned to hg18. What would be the easiest workflow to get a VCF file from GATK in build hg19 ? We are running a local galaxy with only hg19 for the moment. Lifting the bam file would be my first choice, providing support to hg18 by generating all indices would be my last :-)
Best regards,
Geert
The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
------------------------------
Message: 2 Date: Wed, 15 Aug 2012 11:09:37 -0500 From: Branden Timm btimm@wisc.edu To: galaxy-user@lists.bx.psu.edu Subject: [galaxy-user] Linking to Compressed Data Message-ID: 502BC9C1.8090903@wisc.edu Content-Type: text/plain; CHARSET=US-ASCII; format=flowed
Hi All, Is it possible to link to compressed files in a Galaxy data library? We receive all of our NGS data in bz2 or gzip format for obvious reasons, just wondering if I have to decompress it on the filesystem before I link to it or not. Thanks!
-- Branden Timm btimm@glbrc.wisc.edu
------------------------------
Message: 3 Date: Wed, 15 Aug 2012 09:27:36 -0700 From: Jennifer Jackson jen@bx.psu.edu To: Sean Davis sdavis2@mail.nih.gov, "Du, Jianguang" jiandu@iupui.edu Cc: "galaxy-user@lists.bx.psu.edu" galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] How to decide "Mean Inner Distance between Mate Pairs"? Message-ID: 502BCDF8.4030005@bx.psu.edu Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Great advice Sean!
Jianguang, this is the correct analysis - mapping the data to test the actual insert size of the library as sequenced. The experimental notes at SRA are just a starting place, the data is truth. A sample through TopHat itself might produce more precise results. I suspect the coverage on your top Blastn HSP is not complete, breaking off where it hits a splice. And that you have some bias for sequences/hits that cross junctions near ends. But overall, none of this would likely make that much of a difference in the analysis as a whole.
Good luck!
Jen Galaxy team
On 8/15/12 8:39 AM, Sean Davis wrote:
On Wed, Aug 15, 2012 at 11:13 AM, Du, Jianguang <jiandu@iupui.edu mailto:jiandu@iupui.edu> wrote:
Dear All, I am analyzing the downloaded RNA-seq datasets. However I am not sure how much is Mean Inner Distance between Mate Pairs for these paired-end datasets. Take a paired-end RNA-seq dataset as an example, there is a description for this dataset in SRA database of NCBI: "Layout: PAIRED/, Orientation: /5'-3'-3'-5'/, Nominal length: /400/, Nominal Std Dev: /20" At first I thought the Mean Inner Distance between Mate Pairs should be 325bps because the length of reads on both ends is 36bps. However when I aligned the sequence of the paired reads on to transcripts and genome using BLASTn, the distance between the paired reads is about 200bps. How should I decide the Mean Inner Distance between Mate Pairs in my case?
The information from SRA is likely only an approximation. SRA does not validate these details, I do not think.
You can probably use the distribution from your data as the best estimate.
Sean
Thanks. Jianguang Du
The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
------------------------------
Message: 4 Date: Wed, 15 Aug 2012 16:59:27 +0000 From: "Du, Jianguang" jiandu@iupui.edu To: "galaxy-user@lists.bx.psu.edu" galaxy-user@lists.bx.psu.edu Subject: [galaxy-user] Can I convert paired-end datasets into single end ones? Message-ID: 2B3C356FD95D6A41B0CCFF77102E5EDF12867830@IU-MSSG-MBX106.ads.iu.edu Content-Type: text/plain; charset="iso-8859-1"
Dear All,
I have some paired-end datasets to be analyzed, but I am not sure about their Mean Inner Distance between Mate Pairs.
Can I convert these paired-end datasets into single-end ones and use them as single-end dataset as follows?
1) Use the tool "Manipulate FASTQ" to convert the sequence of reverse reads into its reverse-complement counter part, so that all of the reverse reads actually become forward reads.
2) run Tophat on the manipulated datasets as single-end ones.
Thanks.
Jianguang
galaxy-user@lists.galaxyproject.org