I have 11 BED tracks on hg19. I want an 11x11 table with counts of the overlaps between tracks.
Does anyone have a workflow for doing this? I have not found one.
(I need this now, so feel free to replying to christopher.dubay(a)providence.org as well as the listserv)
This message is intended for the sole use of the addressee, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the addressee you are hereby notified that you may not use, copy, disclose, or distribute to anyone the message or any information contained in the message. If you have received this message in error, please immediately advise the sender by reply email and delete this message.
I have some sequencing results I want to blast in order to identify what
are known SNPs and what are not and which variations lead to which
effect (synonymous or non-synonymous mutation) ? At the end, I want a
file with a single Refseq transcript ID (preferably the longest
transcript) and all the variations identified for this gene with, for
each SNP the indication of the position (coding / non-coding), the
consequence (synonymous/non-synonymous) the amino acid substitution etc..
In order to do this I used the SIFT tool on my results. But this latter,
seems to choose randomly the transcript sequence he's referring to. For
example, when I enter this variations
chr19 39191323 + C/T
chr19 39191733 + C/T
chr19 39195653 + C/T
chr19 39196688 + C/T
chr19 39196736 + G/A
chr19 39196745 + C/T
chr19 39207742 + G/A
chr19 39214633 + C/T
chr19 39215172 + T/C
chr19 39215193 + G/A
chr19 39218649 + G/A
chr19 39219780 + T/C
chr19 39220016 + G/A
chr19 39220279 + A/T
the SIFT tool identifies 13 SNPs in the transcript ENST00000252699 (9
novel and 4 already known) and 1 in the ENST00000445727 but this
referring to the same gene : ACTN4! I don't understand why is this
happening and how can I force SIFT tool to "blast" the SNP on only one
transcript (preferably the one corresponding to the longest isoform) ?
I have a second problem related to the same field. This I've got
multiple output Ensembl transcript ID for the same gene with the SIFT
tool, I tried the AAchange one.Once I had the results, I compared them
with the one I had with the SIFT tool. Surprisingly, I found a
systematic decay in the mutated codon between this 2 tools ! I check 1
or 2 identified SNPs found with the SIFT tool to see what was the good
results and according to the DbSNP database it appears that the SIFT
tool is right even if he doesn't always consider the refseq transcript.
So my question is why do they identy different affected codon (even when
they used the same transcript) and how I can force the AAchange tool to
start the analysis at the good nucleotide (to get rid of the decay) ?
If it can help you I enclosed the link allowing you to see what I did
and what I get
If you could help me on this issues it would be great for me ( my work
and my boss)!
*Charlotte Gueydan *, PhD.
INSERM/Université Pierre et Marie Curie
Hôpital Tenon - bâtiment de recherche
4, rue de la Chine
75020 Paris cedex 20
tel : 01 56 01 83 75
I have a problem with the weblogo tool.
I have a clustalw alignment in fasta format that I'd like to visualize
as a logo. The sequence logo module ends with a success (green box), the
info tells me the amount and length of the input data. But the output is
empty, there are no plots (no matter if I select jpg, png, pdf or text).
The respective image can't be displayed "because it contains errors" or
is empty in case of text.
I suspect that the actual call of the weblogo tool doesn't succeed, but
I didn't figure out yet on how to check this. Does anybody have hints on
where to look?
Dr. Holger Klein
Core Facility Bioinformatics
Institute of Molecular Biology gGmbH (IMB)
Tel: +49(6131) 39 21511
Of course - Doh! Many thanks!!
On 4 Jul 2011, at 18:47, Oliver, Gavin wrote:
> "u" represents unknown intergenic transcripts.
> -----Original Message-----
> From: galaxy-user-bounces(a)lists.bx.psu.edu on behalf of David Matthews
> Sent: Mon 04/07/2011 17:48
> To: galaxy-user(a)lists.bx.psu.edu
> Subject: [galaxy-user] Looking for new transcripts with cufflinks
> I am working with HeLa cells and want to know how to get cufflinks etc. to highlight if a region of the genome is being transcribed that is not in the ensembl gtf. I know that cufflinks highlights with class code "j" regions that do not match a known gene and therefore may be novel but most of these arise from transcription on or near known genes. Does anyone know how to look for transcription that is clearly distinct from known genes? This is a wild goose chase but worth a peek just in case...
> Best Wishes,
> Dr David A. Matthews
> Senior Lecturer in Virology
> Room E49
> Department of Cellular and Molecular Medicine,
> School of Medical Sciences
> University Walk,
> University of Bristol
> BS8 1TD
> Tel. +44 117 3312058
> Fax. +44 117 3312091
> The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender.
> Almac Group (UK) Limited, registered no. NI061368. Almac Sciences Limited, registered no. NI041550. Almac Discovery Limited, registered no. NI046249. Almac Pharma Services Limited, registered no. NI045055. Almac Clinical Services Limited, registered no. NI041905. Almac Clinical Technologies Limited, registered no. NI061202. Almac Diagnostics Limited, registered no. NI043067. All preceding companies are registered in Northern Ireland with a registered office address of Almac House, 20 Seagoe Industrial Estate, Craigavon, BT63 5QD, UK.
> Almac Sciences (Scotland) Limited, registered in Scotland no. SC154034.
> Almac Clinical Services LLC, Almac Clinical Technologies LLC and Almac Diagnostics LLC are Delaware limited liability companies and Almac Group Incorporated is a Delaware Corporation. More information on the Almac Group can be found on the Almac website: www.almacgroup.com
I would really appreciate someone's input on some issues I am having with my
cuffdiff output for RNAseq data and how it compares with genes I have
viewed on IGV (or a similar browser) -
1. when I look at my differentially expressed transcripts file (generated
using ensembl hg19 as a reference with chr added on to obtain results with
ensembl gene names) and search for specific genes that I am interested in I
can not find them in my cuffdiff output file - even though I can visualize
these genes on IGV and they look obviously differentially regulated. Also,
given that the cuffdifff output for differentially expressed transcripts
does list all trascripts, including the ones that have not significantly
changed, wouldn't transcripts for these genes be listed anyway, even if my
visual ballparking on differential regulation is not statistically
significant? I would really like to know why I am missing genes from my
2. do you all get a good correlation between the top differentially
expressed transcripts/genes generated from cuffdiff and how the data looks
when visualized on IGV - ie. do your upregulated transcripts really look
upregulated when visualizing? I found that while some validate visually,
some do not which is confusing....
3. when visualizing on a browser, and if different transcripts for one gene
are regulated differently - ie. some are up in your treated sample but some
are done for the same gene - how can you tell which transcriptID from
cuffdiff corresponds with what you are seeing?
Thank you all for your help!
Johns Hopkins School of Medicine
I was wondering if you could help me to understand an error which I get
when mapping using Galaxy:
An error occurred running this job: Error aligning sequence. Warning:
Exhausted best-first chunk memory for read
BIO-SEQUENCER1_0011:1:1:9519:2064#CGATGT/1 (patid 703); skipping read
Warning: Exhausted best-first chunk memory for read
BIO-SEQUENCER1_0011:1:1:1795:6806#CGATGT/1 (patid 4461);
If you know what's the error may mean, could you please let me know?
Academic Renal Unit
University of Bristol
I am trying to analyze my RNAseq data by TopHat-Cufflinks package
based on Galaxy.
I used Epicentre ScriptSeq strand-specific library construction
protocol, which I assume, produces a second-strand library.
When I set FR option to "second strand" and run TopHat and Cufflinks
it results in confusing transcript orientation. Cufflinks-assembled
transcript with multiple exons are oriented as expected. However,
transcripts that entirely reside in one exon/intron or intergenic
region are labeled in an opposite way. If TopHat accepted hits track
contains negative-strand reads (colored red), transcript is labeled as
"positive" and vice versa. An example is shown in the attached
screenshot. Both transcripts TCONS_00000014 and TCONS_00000015 were
mapped to "+" strand, though reads that represent them had been mapped
to "-" strand.
Is there a way to solve this issue?
we have the UCSC genome browser mirror as well as Galaxy mirror. The
Galaxy has a feature enabling a user to display the data at UCSC genome
browser as custom tracks. I have configured the galaxy to display the data
to our UCSC browser mirror but it doesn't work properly: after the
redirecting to genome browser page the "redirected to non-http(s): /root"
error message is appeared. At the same time displaying Galaxy data at
official UCSC works excellent. What are the possible reasons of it?
Thank you in advance!