August 2012 - galaxy-user - lists.galaxyproject.org

Galaxy toolshed-vcftools
by Mahtab Mirmomeni 15 Aug '12

15 Aug '12

Hi I was wondering if there is a wrapper already for vcftools in Galaxy. I want to use the following functionalities of vcftools but I haven't found it in toolshed. Thanks Mahtab Comparing vcf-compare <http://vcftools.sourceforge.net/perl_module.html#vcf-compare> A.vcf.gz B.vcf.gz C.vcf.gz Concatenating vcf-concat <http://vcftools.sourceforge.net/perl_module.html#vcf-concat> A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz

2 1

Can I convert paired-end datasets into single end ones?
by Du, Jianguang 15 Aug '12

15 Aug '12

Dear All, I have some paired-end datasets to be analyzed, but I am not sure about their Mean Inner Distance between Mate Pairs. Can I convert these paired-end datasets into single-end ones and use them as single-end dataset as follows? 1) Use the tool "Manipulate FASTQ" to convert the sequence of reverse reads into its reverse-complement counter part, so that all of the reverse reads actually become forward reads. 2) run Tophat on the manipulated datasets as single-end ones. Thanks. Jianguang

2 1

How to decide "Mean Inner Distance between Mate Pairs"?
by Du, Jianguang 15 Aug '12

15 Aug '12

Dear All, I am analyzing the downloaded RNA-seq datasets. However I am not sure how much is Mean Inner Distance between Mate Pairs for these paired-end datasets. Take a paired-end RNA-seq dataset as an example, there is a description for this dataset in SRA database of NCBI: "Layout: PAIRED, Orientation: 5'-3'-3'-5', Nominal length: 400, Nominal Std Dev: 20". At first I thought the Mean Inner Distance between Mate Pairs should be 325bps because the length of reads on both ends is 36bps. However when I aligned the sequence of the paired reads on to transcripts and genome using BLASTn, the distance between the paired reads is about 200bps. How should I decide the Mean Inner Distance between Mate Pairs in my case? Thanks. Jianguang Du

3 2

picard bam statistic
by i b 15 Aug '12

15 Aug '12

Hi, I have used for the first time this tool, picard bam statistic. I have aligned my reads to a custom genome (7904 bp long) and had the following output: length= 7904 Aligned= 44 Unaligned= 0 Why is unaligned =0. I had the unaligned =0 also when aligning agaist hg19... thanks, ib

3 3

how to sort mapped data?
by Yan He 15 Aug '12

15 Aug '12

Hi everyone, I am working on RNA-seq data. First, I mapped the reads to the reference transcriptome using bowtie. I found some different reads mapped to the same gene with different positions. Before running Cufflinks, I would like to combine the reads that mapped to the same gene though with different positions. Is there a tool in Galaxy can fulfill this purpose? Any suggestion would be much appreciated. Thanks! Yan

2 3

mapping reads to an shRNA library?
by Suzie Hight 15 Aug '12

15 Aug '12

Hello all, Is there a way to use a library of shRNA sequences as my custom genome when using Bowtie? Currently my library is in multiple-sequence FASTA format. Thanks for any help! -Suzie Hight ________________________________ UT Southwestern Medical Center The future of medicine, today.

2 1

Map with Bowtie or Tophat?
by i b 14 Aug '12

14 Aug '12

Hi all, what is the difference between using "NGS:mapping-------Map with Bowtie for Illumina" and "NGS: RNA analysis---------Tophat for Illumina" when mapping reads against a reference/custom genome? thanks, ib

2 1

should I run FASTQ Groomer?
by Du, Jianguang 14 Aug '12

14 Aug '12

Dear All, I have some FASTQ datasets in phred 33 offset, and I have already assinged them Fastqsanger format. Do I need to run FASTQ Groomer on these datasets before I check the data quality by "Fastqc: Fastqc QC" and "FASTQ Trimmer by column" to remove bad nucleotides at 3' end of reads? Should I select "Sanger" as "Input FASTQ quality scores type:" if I need to run Groomer? Thanks. Jianguang Du

2 1

whixh setting should be used to upload mouse reference genome?
by Du, Jianguang 14 Aug '12

14 Aug '12

Dear All, I am going to search the alternative splicing events bentween datasets. I am not sure about the settings of mouse reference genome (mm9) when I upload it from UCSC Main. Would you please tell me the settings for 1) group: 2) Track: 3) Table: 4) Output format: Thanks. Jianguang

2 1

which reference genome should I select
by Du, Jianguang 14 Aug '12

14 Aug '12

Dear All, I am going to run Tophat with mouse RNA-seq datasets. When I uploaded the datasets with URL method, I chose "Mouse July 2007 (NCBI37/mm9) (mm9)" under Genome. So " database: mm9" appears in the brief description of each dataset in history. My question is: when I run Tophat, under "Will you select a reference genome from your history or use a built-in index?", should I selct "Use a built-in index" or "Use one from the history" ? Thanks. Jianguang Du

2 1