April 2012 - galaxy-user - lists.galaxyproject.org

Tophat "Mean Inner Distance between Mate Pairs"
by 杨继文 04 Jun '12

04 Jun '12

Hi all, When mapping pair end RNA-seq reads using tophat, we need to type in "Mean Inner Distance between Mate Pairs". In galaxy, we can read the following information: This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter is required for paired end runs. I think the size of fragment (here 300bp) includes not only the length of pair end reads, but also the length of adaptors. so, maybe the Mean Inner Distance between Mate Pairs should be : fragment length - pair end read length - adaptor length. Am I right? or did I miss something? Is it a must to type in the accurate value? Looking forward to your reply JIwen

4 4

Maq consensus calling changes quality score of the read?
by Antony Jose 22 May '12

22 May '12

Hi, We used the generate pileup tool with consensus base calling option using Maq with default options. In the output, the quality scores of the bases were changed. For example if the input score of bases in the SAM file were 'IIJIH'. They were changed to '22321'. Is this a glitch or is this expected? Thank you. Antony

2 1

Mapping paired end reads with Tophat
by 杨继文 22 May '12

22 May '12

Hi all, After mapping, I used IGV to have a look at the mapping. There are a lot of mapped reads without pair reads. Should I keep these reads? or Is this a problem for cufflinks analysis? What I tried is: 1. BAM to SAM 2. Filter SAM: set Read mapped in a proper pair to Yes. The result is that only 1/5 reads were left. Can anybody tell me if this operation is proper?? How do you normally optimize the mapping rerults from Tophat? Which considerations should I take into account? Looking forward to your reply. Jiwen

2 1

Help!!! cuffdiff log2 value
by 杨继文 10 May '12

10 May '12

Hi, I am analyzing my RNA-Seq data. After running cuffdiff, I got a list of differentially expressed transcrpts or genes. As far as I know, log2 value = fold change. However, there are minus values. Is this possible?? log2 value can not be minus. Did I miss something?? Looking forward to your help. Thanks in advance. Best Jiwen

3 3

Problem with executing larger files
by Dhanushki Samaranayake 02 May '12

02 May '12

Hi, Earlier I tried to upload larger bam files (3.5GB, 3.4GB and 4GB) to Galaxy account, but failed. Your advice was to use FTP upload at http://wiki.g2.bx.psu.edu/FTPUpload. I followed the screencast in the website and did exactly as it has advised. I used FileZilla ftp client, uploaded the files to Galaxy account and executed. Now the problem is at the execution step. For example, my 3.5GB file is accurately uploaded, but once I execute the file I get is 2.5GB. The file seems to be somehow truncated. Please advice! Thanks Dhanushki

4 5

Download multiple files from history
by Adhemar 01 May '12

01 May '12

Hi, I'm trying to download multiple files from a given history but I couldn't figure out how to do it. Is there a way? Thanks, Adhemar

5 4

Question about Samtools filter in Galaxy
by shamsher jagat 01 May '12

01 May '12

I have a sam file after running BWASW and want to extract unique (alignments that are aligning once to genome) from this sam file. I read in other posts that I may be able to use Sam tools> filter Sam option to filter the said flag on wise flag. However I could not find whether I have to use default setting of column 2? when I use option of add flags there are different options for pair reads, however my data is single reads. So exactly single read alignments sam file how we extract unique reads. Am I missing something. I can also share history in order to explain my point if required. Thanks Kanwar

2 2

subtract
by Xianrong Wong 30 Apr '12

30 Apr '12

Hello, I am using the subtract (whole dataset) tool. I converted my fastq file to tabular with 2 columns: 1. Identifier and 2. sequence. I then "selected (a few) lines that match an expression" from this initial tabular file and am trying to get a final dataset that is devoid of reads with the few selected lines - thus I subtract the dataset of selected lines from the initial dataset. This tool works with I am performing the workflow on a relatively small file (1/50 the size of a whole sequencing experiment) but repeatly fails when I input the full fastq file. Any idea why this is so? Jose

2 1

suggestion to change cuffdiff -c "Min alignment count" default to author's value of '10'
by Kevin Silverstein 29 Apr '12

29 Apr '12

Hi Galaxy Developers, This is regarding the wrapper for cuffdiff and the -c parameter for "Min alignment count". We noted that Galaxy's default value is set to 1000. In our experience, setting this parameter that high yields drastically different results from the ones obtained with the author's original default value of c = 10. The -c parameter is defined as: -c/--min-alignment-count <int> The minimum number of alignments in a locus for needed to conduct significance testing on changes in that locus observed between samples. If no testing is performed, changes in the locus are deemed not signficant, and the locus' observed changes don't contribute to correction for multiple testing. The default is 10 fragment alignments. We have found that the value entered will require that at least this number of fragment alignments be found for *each* sample (or at least each condition) in order for the locus to be tested for differential expression. Although one could save themselves a lot of locus comparisons (and hence take a much smaller hit for multiple hypothesis testing) if he/she were to raise this threshold, this will screen out the most desirable differentially expressed genes - specifically those that are not expressed at all in one condition but highly expressed in the other condition. The attached image has a real example of a gene that was counted as differentially expressed (as it should have been) with '10' but was not even tested at '100' nor '1000'. -Kevin Silverstein

2 1

Tophat mapping error
by 杨继文 27 Apr '12

27 Apr '12

Hi all, I got the following error infomation during Tophat mapping An error occurred running this job: Job output not returned by PBS: the output datasets were deleted while the job was running, the job was manually dequeued or there was a cluster error. Please let me know what's wrong. Help will be appreciated. Thanks Jiwen

2 1