The masking tool is supposed to mask all columns of the alignment anywhere
one of them has a quality less than score XX. That means that all
alignments *should* be the same length, even after # symbols are ignored in
HyPhy.
Thus, there shouldn't be a problem with using # as a masking symbol rather
than N.
I will attempt changing the # characters to N, but wanted to mention that
the solutions you sent don't address the possibility that the sequence
lengths might be different, as a result of the masking tool.
I'll let you know how it goes.
Thanks,
Melissa
On Tue, Sep 7, 2010 at 4:28 PM, Guruprasad Ananda <gua110@bx.psu.edu> wrote:
Hi Melissa,
So looks like you'll have to use 'N' as a masking character instead of #.
You can either rerun quality masking on your alignments or do the following
to convert your #s to Ns in your masked fasta files.
1. Convert fasta to tabular
2. Use 'Text manipulation -> Compute' tool on the tabular file from step(1)
to convert #s to Ns, using the following expression: *
c2.replace(chr(35),"N")*
3. Convert output of step(2) to Fasta using 'Convert formats -> *
Tabular-to-FASTA*' tool with c1 as title column and c3 as sequence column.
Thanks,
Guru.
Begin forwarded message:
*From: *Guruprasad Ananda <gua110@bx.psu.edu>
*Date: *September 7, 2010 4:07:54 PM EDT
*To: *Sergei L Kosakovsky Pond <spond@ucsd.edu>
*Cc: *"Melissa A. Wilson Sayres" <mwilsonsayres@gmail.com>, Galaxy Lab <
galaxy-lab@bx.psu.edu>
*Subject: **Re: [galaxy-lab] Question about Branch Lengths Estimation*
Thanks Sergei! Melissa's alignments have been masked by quality scores.
Will it work if she used N as a masking character instead of #?
Guru.
On Sep 7, 2010, at 4:04 PM, Sergei L Kosakovsky Pond wrote:
Hi Guru,
The alignment you attached contains '#' inside sequence strings.
HyPhy will skip those, creating misaligned sequences of differing lengths.
The error you see happens if the first sequence is shorter than others.
There are two solutions:
1). Replace '#' with characters that HyPhy understands (e.g. '-')
2). Tell HyPhy to accept '#' as a valid token.
I think this option can only work with NEXUS input files, because '#' is a
sequence delimiter in the MEGA format, which HyPhy tries to autodetect...
Sergei
Hi Sergei,
Melissa is trying to use HYPHY branch lengths estimator on Galaxy on 4-way
primate alignments. HYPHY is however producing the following error:
Error:
Internal Error in '
Write2Site' - index is too high (using compact representation)
Call stack
2 : Read Data Set ds from file PROMPT_FOR_FILE
1 : ExecuteAFile from file "/tmp/tmpX5jYY7" using basepath /tmp/. reading
input from _genomeScreenOptions
Is this because her sequence lengths are pretty big? Is there a limi on
sequence lengths?
Here is a sample input file that causes this problem, in case you wanted a
test case:
http://bx.psu.edu/~gua110/Galaxy165-%5bchr22_HCOM_fasta%5d.fasta
Tree = ((hg18,panTro2),ponAbe2),rheMac2)
Thanks,
Guru.
Begin forwarded message:
From: "Melissa A. Wilson Sayres" <mwilsonsayres@gmail.com>
Date: September 7, 2010 2:58:44 PM EDT
To: Guruprasad Ananda <gua110@bx.psu.edu>
Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: Re: Question about Branch Lengths Estimation
Hi Guru,
It worked with most of the HC-files (just one failed: chr1_HC_fasta), but
hasn't worked with any of the four-way alignments, using either of these
trees:
(((hg18,panTro2),ponAbe2),rheMac2)
nor
((((hg18),panTro2),ponAbe2),rheMac2)
I tested chr1 and chr2.
Ideas?
Thanks!
Melissa
On Tue, Sep 7, 2010 at 1:27 PM, Melissa A. Wilson Sayres <
mwilsonsayres@gmail.com> wrote:
Thanks Guru!!
It seems I just have a knack for finding these little quirks. :)
I really appreciate you figuring this out!
Best,
Melissa
On Tue, Sep 7, 2010 at 11:48 AM, Guruprasad Ananda <gua110@bx.psu.edu>
wrote:
Hi Melissa,
This seems to be the case with pairwise alignments only. 3-way and above
work well with the regular Newick tree definitions. For instance, for a
4-way alignment, you could use: ((hg17,panTro1),(mm5,rn3))
I'll update the tool help with this info.
Thanks,
Guru.
On Sep 7, 2010, at 11:40 AM, Melissa A. Wilson Sayres wrote:
Hi Guru,
Thanks for getting back to me so quickly!
I'll give that a try and let you know how it goes. If this is the
problem, perhaps the instructions could be updated under the took
because the current directions give a different formatting of the
tree. It is a little counter- intuitive to have to put parentheses
around one species. Do you think it will be the same for four species?
(That I put parentheses around just hg18)
Thanks again!!
Best,
Melissa
On Tuesday, September 7, 2010, Guruprasad Ananda <gua110@bx.psu.edu>
wrote:
Hi Melissa,
Looks like Hyphy doesn't like the way you specified your phylogenetic tree.
I tried running the tool on your test dataset with the tree defined as
((hg18),panTro2) and it ran just fine! Please give it a shot and let me know
if the problem persists.You can find a working example of the same here:
http://main.g2.bx.psu.edu/u/guru/h/imported-melissa-test-history (see
history item #8)
Thanks,Guru.
On Sep 3, 2010, at 5:21 PM, Melissa A. Wilson Sayres wrote:
Hi there,
I am trying to use the tool Branch Lengths Estimation under the Evolution
heading.
I have a pairwise alignment - (hg18,panTro2), but when I try to run the
tool, using the HKY85 model on my FASTA formatted alignment, I get nothing
in the output.
It doesn't give an error, but instead gives:
143: Branch Lengths on data 120
empty, format: tabular, database:hg18Info: Single Alignment Analyses
Any ideas?
Thanks!!Melissa
--
Melissa A. Wilson Sayres
NSF Graduate Research Fellow, Bioinformatics & Genomics
306 Wartik Lab
University Park, PA 16802
maw397@psu.edu
It is far better to grasp the Universe as it really is than to persist in
delusion, however satisfying and reassuring. -- Carl Sagan
_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user
--
Melissa A. Wilson Sayres
NSF Graduate Research Fellow, Bioinformatics & Genomics
306 Wartik Lab
University Park, PA 16802
maw397@psu.edu
It is far better to grasp the Universe as it really is than to persist
in delusion, however satisfying and reassuring. -- Carl Sagan
--
Melissa A. Wilson Sayres
NSF Graduate Research Fellow, Bioinformatics & Genomics
306 Wartik Lab
University Park, PA 16802
maw397@psu.edu
It is far better to grasp the Universe as it really is than to persist in
delusion, however satisfying and reassuring. -- Carl Sagan
--
Melissa A. Wilson Sayres
NSF Graduate Research Fellow, Bioinformatics & Genomics
306 Wartik Lab
University Park, PA 16802
maw397@psu.edu
It is far better to grasp the Universe as it really is than to persist in
delusion, however satisfying and reassuring. -- Carl Sagan
/-------------------------------------------------------------------------------------------------/
Ignorance more frequently begets confidence than does knowledge
Charles Darwin
/-------------------------------------------------------------------------------------------------/
Sergei L. Kosakovsky Pond, Ph.D.
Assistant Adjunct Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
Theodore Gildred Facitlity
220 Dickinson St Suite A
San Diego, CA 92103
USA
Phone: +1 619 543 8898
Fax : +1 619 543 5066
Web : http://www.hyphy.org/sergei/
HyPhy Page: http://www.hyphy.org/
Adaptive Evolution Server: http://www.datamonkey.org/
_______________________________________________
galaxy-lab mailing list
galaxy-lab@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-lab
--
Melissa A. Wilson Sayres
NSF Graduate Research Fellow, Bioinformatics & Genomics
306 Wartik Lab
University Park, PA 16802
maw397@psu.edu
It is far better to grasp the Universe as it really is than to persist in
delusion, however satisfying and reassuring. -- Carl Sagan