February 2014 - galaxy-user - lists.galaxyproject.org

Depth of coverage error
by Sandrine Imbeaud 12 Feb '14

12 Feb '14

Hi, I am using the "Depth of coverage on BAM files" tool from the NGS: GATK Tools but encounter problem. While part of the BAM files proceed successfully, some BAM files end proceeding with a systematic error (see below). The entire BAM file dataset was generated with the same deepseq analysis pipeline (MiSeq Reporter, PCR amplicon Workflow) => alignemnent is done on Hs. hg19 and manifest is focused on specific regions. Does someone know how to solve this problem? Also, the chromosome X and Y appear to be exclude from the calculation. Is there any settings to select in order to include both sexual chromosomes? Is it expected? Kind Regards / Sandrine -- Sandrine Imbeaud INSERM, UMR U-674, IUH Université Paris Descartes Génomique Fonctionnelle des tumeurs solides 27 rue Juliette Dodu F75010 Paris, France TEL: +33 (0)1 53 72 51 98 FAX: +33 (0)1 53 72 51 92 MOBILE: +33 (0)6 12 69 80 29 http://www.inserm-u674.net/ Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp [Mon Feb 10 07:08:03 CST 2014] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/tmp/tmp-gatk-hPNp7B/gatk_input.fasta OUTPUT=/tmp/tmp-gatk-hPNp7B/dict1033574942916796988.tmp TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Mon Feb 10 07:08:03 CST 2014] Executing as g2main(a)roundup49.tacc.utexas.edu on Linux 2.6.32-431.1.2.0.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_40-b43; Picard version: 1.58(1057) [Mon Feb 10 07:08:04 CST 2014] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=2025324544 ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A USER ERROR has occurred (version 1.4-18-g80a4ce0): ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed ##### ERROR Please do not post this error to the GATK forum ##### ERROR ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments. ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa ##### ERROR ##### ERROR MESSAGE: Input files reads and reference have incompatible contigs: No overlapping contigs found. ##### ERROR reads contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY] ##### ERROR reference contigs = [c1_27022371-27108741, c1_65310990-65312594, c1_103341958-103574198, c1_115256193-115258963, c11_533546-534529, c11_108093339-108239879, c12_25379959-25398502, c12_46123245-46302016, c12_49412540-49449288, c12_70910587-71031382, c12_121416337-121440487, c13_77618564-77901349, c16_337220-402825, c16_2097254-2138873, c17_7571510-7591038, c17_40474813-40500706, c19_10596592-10614570, c2_21224182-21267085, c2_178098515-178099162, c20_57484200-57484647, c3_41240709-41282027, c3_41265294-41266864, c3_47057688-47205620, c3_178928002-178948337, c4_74269729-74287296, c4_185308648-185395907, c5_1253083-1294978, c5_1294819-1295755, c5_55251637-55260189, c5_71402848-71505569, c6_36644019-36655290, c7_135242597-135333692, c7_140452856-140453348, c9_21967531-21994664, c9_135766517-135820181] ##### ERROR ------------------------------------------------------------------------------------------ mv: cannot stat `/galaxy-repl/main/files/007/581/dataset_7581988.dat.sample_summary': No such file or directory

2 1

Picrust upload
by Ines 12 Feb '14

12 Feb '14

Hello, I was trying to run a dataset through Picrust on the web-based Galaxy. I did the OTU table in Qiime as explained in http://picrust.github.com/picrust/tutorials/otu_picking.html#otu-picking-tu…, with the command line: pick_closed_reference_otus.py -i $path/file.fasta -r $path/rep-set/97_otus.fasta -o $path/OTU_Table.biom. I uploaded the file in Get Data, and I want to run the Normalize by Copy Number step, but I get the following message History does not include a dataset of the required format / build. I am not sure if when I generate the OTU Table I need to specify a particular format, or what the problem is. I would really appreciate some advice. I attach the biom file generated in case you need to take a look. Thank you very much in advance.

2 1

Questions regarding Circster visualization
by Friederike Dündar 12 Feb '14

12 Feb '14

Hi, I just noticed that Circos plots are now incorporated into Galaxy's visualization methods which is awesome! However, I'm a bit lost as to what kinds of data I can load into Circos and I cannot find much documentation, so I hope you guys can help out. 1. I tested it using a bigWig and a BED file. Both were loaded nicely in Circos, but I was surprised to see that the visualization of both files looked exactly the same, i.e. both file types seemed to be interpreted as histograms/coverage data. From the Circos plots I've seen in publications, I assumed that BED files should be visualized as straight lines, indicating genome regions (rather than a coverage). Am I doing anything wrong? Or, rather, how should I modify the BED file so that its content is simply interpreted as genomic regions? 2. In the Galaxy publication (www.biomedcentral.com/1471-2164/14/397) "line data" is mentioned for displaying connecting lines in the center of the circle - could you give me an example line of how this kind of data needs to be formated? It would be great to make much more use of Circster! Thanks a lot! Best wishes, Friederike

2 3

Re: [galaxy-user] Help with Cuffdiff
by Jennifer Jackson 11 Feb '14

11 Feb '14

Hi Maria, I didn't notice any obvious problematic usage, format, or content issues with the Tuxedo pipeline execution in your history. Your protocol is right on track. This leaves data and parameter inputs to consider. I did notice that you are mainly using defaults and omitting the use of reference annotation that Cuffdiff uses to generate the full compliment of statistics. The "NOTEST" result indicates that the coverage is too shallow. You could follow the advice here, by adjusting "-c" to be lower. This is "Min Alignment Count:" and is set to "10" in your runs. http://cufflinks.cbcb.umd.edu/faq.html#notest Adding in a reference annotation file could also potentially help. Aligned sequences may be falsely fragmenting without a reference transcript to help bind them together. But, this is just a guess - I didn't examine any assembly regions. This is however something that you could do. The UCSC Table Browser is one source for a GTF file. Experimenting with other parameters as you are doing also is worth it. The manual and such cover these in detail, and there is always the tool author's google group for detailed questions/advice. Good luck with your project, Jen Galaxy team On 2/6/14 12:38 PM, Maria Hoffman wrote: > Hello, > > Thank you for your help. I have found that wiki page very helpful and > actually us it very often (I was using it this AM too before I emailed > you). In looking at the wiki again, nothing is really standing out to > me ( my chromosome notation matches up etc). I am going to keep > looking etc but I did send you my history too. I did try running > another cuffdiff playing with the dispersion estimation method too out > of curiosity. > > Thank you so much for your help! This is my first real data set doing > this and we have abstracts due soon, so the pressure is on! > > Thanks! > Maria -- Jennifer Hillman-Jackson http://galaxyproject.org

1 0

Feb 10, 2014 Galaxy Distribution & News Brief
by Jennifer Jackson 11 Feb '14

11 Feb '14

Feb 10, 2014 Galaxy Distribution & News Brief <https://wiki.galaxyproject.org/News/2014_02_10_Galaxy_Distribution> // *CompleteNews Brief <https://wiki.galaxyproject.org/DevNewsBriefs/2014_02_10>* *Highlights:* * Visualization upgrades, including Trackster CSS styling * Multiple Tools migrated to the Tool Shed for a leaner distribution * Redesign of UI rendering: new icons, new font, history pane updates * API functionality upgrades featuring a new master admin API key and * Tool Shed updates a focus on repository metadata, displays, installs, and tests * Over 35 new community contributions added http://getgalaxy.org <http://getgalaxy.org/> http://bitbucket.org/galaxy/galaxy-dist http://galaxy-dist.readthedocs.org <http://galaxy-dist.readthedocs.org/> new: $ hg clone https://bitbucket.org/galaxy/galaxy-dist#stable upgrade: $ hg pull $ hg update release_2014.02.10 /Thanks for using Galaxy!/ The Galaxy Team <https://wiki.galaxyproject.org/Galaxy%20Team>

1 0

Re: [galaxy-user] Finding constitutive exons using expression data (7plusorminus 3)
by Sébastien Vigneau 10 Feb '14

10 Feb '14

Hi 7plusorminus 3, One possibility is to use the "group" tool with "max" operation, to get the highest expressed exon for each gene. Then, you may use "subtract datasets" to remove the highest expressed exons from the original dataset, and iterate to get the second highest expressed exons (which are now the highest expressed exons). "Group" may also help you getting the exons with more proximal or distal start position (whether it is 5' or 3' depends on the orientation of the gene). Alternatively, if you know how to use R, you can use the function "by" (here is a good explanation: http://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-… ). Sébastien ---------------------------------------- Message: 1 Date: Sun, 9 Feb 2014 16:43:14 -0500 From: 7plusorminus 3 <7plusorminus3(a)gmail.com> To: galaxy-user(a)lists.bx.psu.edu Subject: [galaxy-user] Finding constitutive exons using expression data Message-ID: <CALfFDirXk1LYY6t+JHnnRCfOGiqcaENubBZ3J_movCmV6bRUSg(a)mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi, I'm trying to find over the entire human genome, for each gene, which exons are the most constitutively expressed. To do this, I'd like to combine expression data (RNA-seq or Microarray) and exons data (UCSC track). Then, for each gene, I'd like to pick the 1 or 2 exons with the highest levels of expression (my proxy for constitutiveness). An additional nicety would be to somehow work in a preference for 5' exons. For example, let's say a gene has 3 exons and, with the expression data, all 3 exons are equally expressed. I'd like to selectively get the first 2 exons. I've started learning Galaxy and was able to import BED files for UCSC exons (as in the Galaxy 101 tutorial) and a BED file for Affy microarray expression data. (I tried also importing the Burge RNA-seq track as BED but couldn't get it to work). I did an inner join on genomic sequences to join the expression data with the exons and sorted them from most expressed to least. But how do I sort within genes? That is, how do I get the top 2 exons per gene (highest expressing exons per gene) and, if there are more than 2 with equally high expression, how do I preferentially get the 5` exons? I'm also open to ways to do this without using Galaxy, etc. I want to do this for an entire genome, so I figured it would be good to have a Galaxy workflow, which I could then apply to other genomes as needed. Thanks for any help

1 0

Finding constitutive exons using expression data
by 7plusorminus 3 09 Feb '14

09 Feb '14

Hi, I'm trying to find over the entire human genome, for each gene, which exons are the most constitutively expressed. To do this, I'd like to combine expression data (RNA-seq or Microarray) and exons data (UCSC track). Then, for each gene, I'd like to pick the 1 or 2 exons with the highest levels of expression (my proxy for constitutiveness). An additional nicety would be to somehow work in a preference for 5' exons. For example, let's say a gene has 3 exons and, with the expression data, all 3 exons are equally expressed. I'd like to selectively get the first 2 exons. I've started learning Galaxy and was able to import BED files for UCSC exons (as in the Galaxy 101 tutorial) and a BED file for Affy microarray expression data. (I tried also importing the Burge RNA-seq track as BED but couldn't get it to work). I did an inner join on genomic sequences to join the expression data with the exons and sorted them from most expressed to least. But how do I sort within genes? That is, how do I get the top 2 exons per gene (highest expressing exons per gene) and, if there are more than 2 with equally high expression, how do I preferentially get the 5` exons? I'm also open to ways to do this without using Galaxy, etc. I want to do this for an entire genome, so I figured it would be good to have a Galaxy workflow, which I could then apply to other genomes as needed. Thanks for any help

1 0

Constitutive exon workflow?
by 7plusorminus 3 08 Feb '14

08 Feb '14

Hi folks, I'm trying to find over the entire human genome, for each gene, which exons are the most constitutively expressed. To do this, I'd like to combine expression data (RNA-seq or Microarray) and exons data (UCSC track). Then, for each gene, I'd like to pick the 1 or 2 exons with the highest levels of expression (my proxy for constitutiveness). An additional nicety would be to somehow work in a preference for 5' exons. For example, let's say a gene has 3 exons and, with the expression data, all 3 exons are equally expressed. I'd like to selectively get the first 2 exons. I've started learning Galaxy and was able to import BED files for UCSC exons (as in the Galaxy 101 tutorial) and a BED file for Affy microarray expression data. (I tried also importing the Burge RNA-seq track as BED but couldn't get it to work). I did an inner join on genomic sequences to join the expression data with the exons and sorted them from most expressed to least. But how do I sort within genes? That is, how do I get the top 2 exons per gene (highest expressing exons per gene) and, if there are more than 2 with equally high expression, how do I preferentially get the 5` exons? I'm also open to ways to do this without using Galaxy, etc. I want to do this for an entire genome, so I figured it would be good to have a Galaxy workflow, which I could then apply to other genomes as needed. Thanks for any help. jim

1 0

How do I remove "Histories shared with you by others"?
by Casey Bergman 06 Feb '14

06 Feb '14

Hello - I am trying to clean out my "Histories shared with you by others" page on the main public Galaxy server. I would like to remove old histories shared with me. I have tried two methods, neither of which allow me to remove these histories: 1. clicking the check box to a history shared with me, then clicking "Unshare" at the bottom of the page 2. clicking the drop down menu and then selecting "Unshare" In both cases, I get the error message "History is not owned by the current user". This behavior appears to be referenced in another recent thread from the sharer's side (http://user.list.galaxyproject.org/Trying-to-quot-unshare-quot-a-history-in…) Any ideas on what is going on here? Best regards, Casey

3 4

Help with Cuffdiff
by Maria Hoffman 06 Feb '14

06 Feb '14

Hello, I am new to cuff diff and just got my data output back and it doesn't look like anything is statistically significant. There are three treatment groups with two biological replicates each group. I am not sure if I made an error somewhere along the line, need to adjust the parameters, or if there really could be no change. The samples are from sheep and I have been using the OvisAries3.0 reference I downloaded from UCSC. Thanks Maria

2 1