Thanks for looking into this, Guru :)
The masking tool is supposed to mask all columns of the alignment anywhere one of them has a quality less than score XX. That means that all alignments should be the same length, even after # symbols are ignored in HyPhy.
Hi Melissa,So looks like you'll have to use 'N' as a masking character instead of #. You can either rerun quality masking on your alignments or do the following to convert your #s to Ns in your masked fasta files.1. Convert fasta to tabular2. Use 'Text manipulation -> Compute' tool on the tabular file from step(1) to convert #s to Ns, using the following expression: c2.replace(chr(35),"N")3. Convert output of step(2) to Fasta using 'Convert formats -> Tabular-to-FASTA' tool with c1 as title column and c3 as sequence column.Thanks,Guru.Begin forwarded message:From: Guruprasad Ananda <gua110@bx.psu.edu>
Date: September 7, 2010 4:07:54 PM EDT
To: Sergei L Kosakovsky Pond <spond@ucsd.edu>
Cc: "Melissa A. Wilson Sayres" <mwilsonsayres@gmail.com>, Galaxy Lab <galaxy-lab@bx.psu.edu>
Subject: Re: [galaxy-lab] Question about Branch Lengths Estimation
_______________________________________________Thanks Sergei! Melissa's alignments have been masked by quality scores. Will it work if she used N as a masking character instead of #?
Guru.
On Sep 7, 2010, at 4:04 PM, Sergei L Kosakovsky Pond wrote:
Hi Guru,The alignment you attached contains '#' inside sequence strings.HyPhy will skip those, creating misaligned sequences of differing lengths.The error you see happens if the first sequence is shorter than others.There are two solutions:1). Replace '#' with characters that HyPhy understands (e.g. '-')
2). Tell HyPhy to accept '#' as a valid token.I think this option can only work with NEXUS input files, because '#' is a sequence delimiter in the MEGA format, which HyPhy tries to autodetect...
SergeiHi Sergei,Melissa is trying to use HYPHY branch lengths estimator on Galaxy on 4-way primate alignments. HYPHY is however producing the following error:
Error:Internal Error in 'Write2Site' - index is too high (using compact representation)Call stack2 : Read Data Set ds from file PROMPT_FOR_FILE1 : ExecuteAFile from file "/tmp/tmpX5jYY7" using basepath /tmp/. reading input from _genomeScreenOptionsIs this because her sequence lengths are pretty big? Is there a limi on sequence lengths?
Here is a sample input file that causes this problem, in case you wanted a test case:http://bx.psu.edu/~gua110/Galaxy165-%5bchr22_HCOM_fasta%5d.fastaTree = ((hg18,panTro2),ponAbe2),rheMac2)Thanks,Guru.Begin forwarded message:From: "Melissa A. Wilson Sayres" <mwilsonsayres@gmail.com>Date: September 7, 2010 2:58:44 PM EDTTo: Guruprasad Ananda <gua110@bx.psu.edu>
Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: Re: Question about Branch Lengths EstimationHi Guru,It worked with most of the HC-files (just one failed: chr1_HC_fasta), but hasn't worked with any of the four-way alignments, using either of these trees:
(((hg18,panTro2),ponAbe2),rheMac2)nor((((hg18),panTro2),ponAbe2),rheMac2)I tested chr1 and chr2.Ideas?Thanks!MelissaOn Tue, Sep 7, 2010 at 1:27 PM, Melissa A. Wilson Sayres <mwilsonsayres@gmail.com> wrote:
Thanks Guru!!It seems I just have a knack for finding these little quirks. :)I really appreciate you figuring this out!
Best,MelissaOn Tue, Sep 7, 2010 at 11:48 AM, Guruprasad Ananda <gua110@bx.psu.edu> wrote:Hi Melissa,This seems to be the case with pairwise alignments only. 3-way and above work well with the regular Newick tree definitions. For instance, for a 4-way alignment, you could use: ((hg17,panTro1),(mm5,rn3))
I'll update the tool help with this info.Thanks,Guru.On Sep 7, 2010, at 11:40 AM, Melissa A. Wilson Sayres wrote:Hi Guru,Thanks for getting back to me so quickly!I'll give that a try and let you know how it goes. If this is theproblem, perhaps the instructions could be updated under the tookbecause the current directions give a different formatting of thetree. It is a little counter- intuitive to have to put parenthesesaround one species. Do you think it will be the same for four species?(That I put parentheses around just hg18)Thanks again!!Best,MelissaOn Tuesday, September 7, 2010, Guruprasad Ananda <gua110@bx.psu.edu> wrote:
Hi Melissa,Looks like Hyphy doesn't like the way you specified your phylogenetic tree. I tried running the tool on your test dataset with the tree defined as ((hg18),panTro2) and it ran just fine! Please give it a shot and let me know if the problem persists.You can find a working example of the same here:http://main.g2.bx.psu.edu/u/guru/h/imported-melissa-test-history (see history item #8)
Thanks,Guru.On Sep 3, 2010, at 5:21 PM, Melissa A. Wilson Sayres wrote:
Hi there,I am trying to use the tool Branch Lengths Estimation under the Evolution heading.
I have a pairwise alignment - (hg18,panTro2), but when I try to run the tool, using the HKY85 model on my FASTA formatted alignment, I get nothing in the output.
It doesn't give an error, but instead gives:143: Branch Lengths on data 120empty, format: tabular, database:hg18Info: Single Alignment Analyses
Any ideas?Thanks!!Melissa--Melissa A. Wilson SayresNSF Graduate Research Fellow, Bioinformatics & Genomics306 Wartik LabUniversity Park, PA 16802maw397@psu.eduIt is far better to grasp the Universe as it really is than to persist in delusion, however satisfying and reassuring. -- Carl Sagan
_______________________________________________galaxy-user mailing listgalaxy-user@lists.bx.psu.eduhttp://lists.bx.psu.edu/listinfo/galaxy-user
--Melissa A. Wilson SayresNSF Graduate Research Fellow, Bioinformatics & Genomics306 Wartik LabUniversity Park, PA 16802maw397@psu.edu
It is far better to grasp the Universe as it really is than to persistin delusion, however satisfying and reassuring. -- Carl Sagan--Melissa A. Wilson SayresNSF Graduate Research Fellow, Bioinformatics & Genomics
306 Wartik LabUniversity Park, PA 16802maw397@psu.eduIt is far better to grasp the Universe as it really is than to persist in delusion, however satisfying and reassuring. -- Carl Sagan
--Melissa A. Wilson SayresNSF Graduate Research Fellow, Bioinformatics & Genomics306 Wartik LabUniversity Park, PA 16802maw397@psu.eduIt is far better to grasp the Universe as it really is than to persist in delusion, however satisfying and reassuring. -- Carl Sagan/-------------------------------------------------------------------------------------------------/
Ignorance more frequently begets confidence than does knowledgeCharles Darwin/-------------------------------------------------------------------------------------------------/Sergei L. Kosakovsky Pond, Ph.D.Assistant Adjunct ProfessorDivision of Infectious DiseasesDivision of Biomedical InformaticsSchool of MedicineUniversity of California San DiegoTheodore Gildred Facitlity220 Dickinson St Suite ASan Diego, CA 92103USAPhone: +1 619 543 8898Fax : +1 619 543 5066Web : http://www.hyphy.org/sergei/
HyPhy Page: http://www.hyphy.org/Adaptive Evolution Server: http://www.datamonkey.org/
galaxy-lab mailing list
galaxy-lab@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-lab