July 2011 - galaxy-user - lists.galaxyproject.org

Citing Galaxy
by Sher, Falak 07 Jul '11

07 Jul '11

I did ChIP-Seq analysis using Galaxy; I was just wandering which one is most relevant article to cite Galaxy for this kind of work (ChIP-Seq) Thank you in advance, Falak

2 1

BED Track Overlaps Count Table
by Dubay, Christopher 07 Jul '11

07 Jul '11

Hello, I have 11 BED tracks on hg19. I want an 11x11 table with counts of the overlaps between tracks. Does anyone have a workflow for doing this? I have not found one. Christopher (I need this now, so feel free to replying to christopher.dubay(a)providence.org as well as the listserv) This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.

1 0

AAchange and SIFT tools
by Charlotte Gueydan 07 Jul '11

07 Jul '11

Hello, I have some sequencing results I want to blast in order to identify what are known SNPs and what are not and which variations lead to which effect (synonymous or non-synonymous mutation) ? At the end, I want a file with a single Refseq transcript ID (preferably the longest transcript) and all the variations identified for this gene with, for each SNP the indication of the position (coding / non-coding), the consequence (synonymous/non-synonymous) the amino acid substitution etc.. In order to do this I used the SIFT tool on my results. But this latter, seems to choose randomly the transcript sequence he's referring to. For example, when I enter this variations chr19 39191323 + C/T chr19 39191733 + C/T chr19 39195653 + C/T chr19 39196688 + C/T chr19 39196736 + G/A chr19 39196745 + C/T chr19 39207742 + G/A chr19 39214633 + C/T chr19 39215172 + T/C chr19 39215193 + G/A chr19 39218649 + G/A chr19 39219780 + T/C chr19 39220016 + G/A chr19 39220279 + A/T the SIFT tool identifies 13 SNPs in the transcript ENST00000252699 (9 novel and 4 already known) and 1 in the ENST00000445727 but this referring to the same gene : ACTN4! I don't understand why is this happening and how can I force SIFT tool to "blast" the SNP on only one transcript (preferably the one corresponding to the longest isoform) ? I have a second problem related to the same field. This I've got multiple output Ensembl transcript ID for the same gene with the SIFT tool, I tried the AAchange one.Once I had the results, I compared them with the one I had with the SIFT tool. Surprisingly, I found a systematic decay in the mutated codon between this 2 tools ! I check 1 or 2 identified SNPs found with the SIFT tool to see what was the good results and according to the DbSNP database it appears that the SIFT tool is right even if he doesn't always consider the refseq transcript. So my question is why do they identy different affected codon (even when they used the same transcript) and how I can force the AAchange tool to start the analysis at the good nucleotide (to get rid of the decay) ? If it can help you I enclosed the link allowing you to see what I did and what I get :http://main.g2.bx.psu.edu/u/charlotte/h/sift-vs-aa-change <http://main.g2.bx.psu.edu/history/sharing?id=dbac2dd575f62542#> If you could help me on this issues it would be great for me ( my work and my boss)! thanks Charlotte *Charlotte Gueydan *, PhD. INSERM/Université Pierre et Marie Curie Hôpital Tenon - bâtiment de recherche 4, rue de la Chine 75020 Paris cedex 20 tel : 01 56 01 83 75

1 0

Weblogo results empty
by Holger Klein 07 Jul '11

07 Jul '11

Dear all, I have a problem with the weblogo tool. I have a clustalw alignment in fasta format that I'd like to visualize as a logo. The sequence logo module ends with a success (green box), the info tells me the amount and length of the input data. But the output is empty, there are no plots (no matter if I select jpg, png, pdf or text). The respective image can't be displayed "because it contains errors" or is empty in case of text. I suspect that the actual call of the weblogo tool doesn't succeed, but I didn't figure out yet on how to check this. Does anybody have hints on where to look? Cheers, Holger -- Dr. Holger Klein Core Facility Bioinformatics Institute of Molecular Biology gGmbH (IMB) http://www.imb-mainz.de/ Tel: +49(6131) 39 21511

2 7

macs14 with Galaxy
by Hersh 07 Jul '11

07 Jul '11

Hi, Has anyone tried configuring latest version of macs with local Galaxy server ? Regards Hersh

2 1

Re: [galaxy-user] Looking for new transcripts with cufflinks
by David Matthews 06 Jul '11

06 Jul '11

Of course - Doh! Many thanks!! On 4 Jul 2011, at 18:47, Oliver, Gavin wrote: > "u" represents unknown intergenic transcripts. > > > -----Original Message----- > From: galaxy-user-bounces(a)lists.bx.psu.edu on behalf of David Matthews > Sent: Mon 04/07/2011 17:48 > To: galaxy-user(a)lists.bx.psu.edu > Subject: [galaxy-user] Looking for new transcripts with cufflinks > > Hi, > > I am working with HeLa cells and want to know how to get cufflinks etc. to highlight if a region of the genome is being transcribed that is not in the ensembl gtf. I know that cufflinks highlights with class code "j" regions that do not match a known gene and therefore may be novel but most of these arise from transcription on or near known genes. Does anyone know how to look for transcription that is clearly distinct from known genes? This is a wild goose chase but worth a peek just in case... > > > Best Wishes, > David. > > __________________________________ > Dr David A. Matthews > > Senior Lecturer in Virology > Room E49 > Department of Cellular and Molecular Medicine, > School of Medical Sciences > University Walk, > University of Bristol > Bristol. > BS8 1TD > U.K. > > Tel. +44 117 3312058 > Fax. +44 117 3312091 > > D.A.Matthews(a)bristol.ac.uk > > > > > > The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. > > Almac Group (UK) Limited, registered no. NI061368. Almac Sciences Limited, registered no. NI041550. Almac Discovery Limited, registered no. NI046249. Almac Pharma Services Limited, registered no. NI045055. Almac Clinical Services Limited, registered no. NI041905. Almac Clinical Technologies Limited, registered no. NI061202. Almac Diagnostics Limited, registered no. NI043067. All preceding companies are registered in Northern Ireland with a registered office address of Almac House, 20 Seagoe Industrial Estate, Craigavon, BT63 5QD, UK. > > Almac Sciences (Scotland) Limited, registered in Scotland no. SC154034. > > Almac Clinical Services LLC, Almac Clinical Technologies LLC and Almac Diagnostics LLC are Delaware limited liability companies and Almac Group Incorporated is a Delaware Corporation. More information on the Almac Group can be found on the Almac website: www.almacgroup.com >

3 3

Questions on CuffDiff Output and Browser Visualization
by Kurinji Pandiyan 06 Jul '11

06 Jul '11

I would really appreciate someone's input on some issues I am having with my cuffdiff output for RNAseq data and how it compares with genes I have viewed on IGV (or a similar browser) - 1. when I look at my differentially expressed transcripts file (generated using ensembl hg19 as a reference with chr added on to obtain results with ensembl gene names) and search for specific genes that I am interested in I can not find them in my cuffdiff output file - even though I can visualize these genes on IGV and they look obviously differentially regulated. Also, given that the cuffdifff output for differentially expressed transcripts does list all trascripts, including the ones that have not significantly changed, wouldn't transcripts for these genes be listed anyway, even if my visual ballparking on differential regulation is not statistically significant? I would really like to know why I am missing genes from my cuffdiff output. 2. do you all get a good correlation between the top differentially expressed transcripts/genes generated from cuffdiff and how the data looks when visualized on IGV - ie. do your upregulated transcripts really look upregulated when visualizing? I found that while some validate visually, some do not which is confusing.... 3. when visualizing on a browser, and if different transcripts for one gene are regulated differently - ie. some are up in your treated sample but some are done for the same gene - how can you tell which transcriptID from cuffdiff corresponds with what you are seeing? Thank you all for your help! Kurinji Graduate Student Johns Hopkins School of Medicine

2 1

Error aligning sequence
by A Bierzynska 06 Jul '11

06 Jul '11

Hi, I was wondering if you could help me to understand an error which I get when mapping using Galaxy: An error occurred running this job: Error aligning sequence. Warning: Exhausted best-first chunk memory for read BIO-SEQUENCER1_0011:1:1:9519:2064#CGATGT/1 (patid 703); skipping read Warning: Exhausted best-first chunk memory for read BIO-SEQUENCER1_0011:1:1:1795:6806#CGATGT/1 (patid 4461); If you know what's the error may mean, could you please let me know? Best Wishes, Agnieszka ---------------------- A Bierzynska Academic Renal Unit University of Bristol bizab(a)bristol.ac.uk

3 2

strand specificity
by Aleks Schein 06 Jul '11

06 Jul '11

Dear All, I am trying to analyze my RNAseq data by TopHat-Cufflinks package based on Galaxy. I used Epicentre ScriptSeq strand-specific library construction protocol, which I assume, produces a second-strand library. When I set FR option to "second strand" and run TopHat and Cufflinks it results in confusing transcript orientation. Cufflinks-assembled transcript with multiple exons are oriented as expected. However, transcripts that entirely reside in one exon/intron or intergenic region are labeled in an opposite way. If TopHat accepted hits track contains negative-strand reads (colored red), transcript is labeled as "positive" and vice versa. An example is shown in the attached screenshot. Both transcripts TCONS_00000014 and TCONS_00000015 were mapped to "+" strand, though reads that represent them had been mapped to "-" strand. Is there a way to solve this issue? Regards, Aleks Schein

2 1

problem with displaying tracks from Galaxy
by Sergei Ryazansky 06 Jul '11

06 Jul '11

Hello all, we have the UCSC genome browser mirror as well as Galaxy mirror. The Galaxy has a feature enabling a user to display the data at UCSC genome browser as custom tracks. I have configured the galaxy to display the data to our UCSC browser mirror but it doesn't work properly: after the redirecting to genome browser page the "redirected to non-http(s): /root" error message is appeared. At the same time displaying Galaxy data at official UCSC works excellent. What are the possible reasons of it? Thank you in advance!

1 0