Hi,

I wanted to identify that I've resolved this issue.

The problem is that the tool must consider the likelihoods of all possible un-phased genotypes with N alleles and M copies across P pools.  This becomes quite a big number when N becomes large and M is large, as it might in low-complexity loci and deep pools.

See: https://github.com/ekg/freebayes/commit/576bc703c246035342538a0feeecd13c28f3d2eb, and also https://groups.google.com/forum/#!topic/freebayes/R6dReM4sPoQ for a discussion of how this can be dealt with.  The --use-best-n-alleles option was previously targeted only for SNPs, which made it ineffective at dealing with the combinatoric expansion as most multiallelic loci contain indels or other kinds of non-SNP variation.  In the most recent version this can be set low (e.g. 2 or 3 in your case) to prevent the memory blowup.

The current version of freebayes is not currently in Galaxy--- but I am working on getting the most recent version of freebayes available there.  Sorry for the troubles.  I hope you'll still have a chance to analyze your data with the pooled functionality in freebayes.

Erik


On Wed, Apr 24, 2013 at 9:55 PM, Jennifer Jackson <jen@bx.psu.edu> wrote:
Hi Nic,

Yes, the program is running into a memory issue with this setting (confirmed by reviewing your bug report, thank you!).

This is not an issue that is localized to Galaxy or even our server/cluster, but seems to be with the tool itself and it comes up on different systems under different cases when deviating from a ploidy setting of 1 or 2. So, sticking with ploidy = 2 is one option.

You might try contacting the tool author at the Freebayes google group for more detailed advice, the link is:
https://groups.google.com/forum/#!forum/freebayes

Best,

Jen
Galaxy team


On 4/18/13 8:34 AM, Nicola Smith wrote:

Hi,

 

I am new to this and I hope someone can help. I have pooled sequencing data that I am trying to analyse using Galaxy. I’ve done quite a bit of online searching and it seems that FreeBayes should be able to do this, if I select “set population”, click the “Assume that samples result from pooled sequencing” option and change the ploidy to nx2 (number of alleles, where n is the number of subjects and the organism is diploid).

 

However, whenever I do this I get an error: usually just “Killed”

 

I was originally setting my polidy rather high (190 as I have 95 subjects pooled), so I wondered if this was the problem, however, it fails if I do a ploidy of only 4 too. I’ve tried various things to try to see where I am going wrong:

 

All with the same BAM file:

 

Set population model options: Do not set ŕ works

 

Set population model options: set, Assume that samples result from pooled sequencing: not ticked, Default ploidy for the analysis: 2 ŕ works

 

Set population model options: set, Assume that samples result from pooled sequencing: ticked, Default ploidy for the analysis: 2 ŕ works

 

Set population model options: set, Assume that samples result from pooled sequencing: ticked, Default ploidy for the analysis: 4 ŕ fails (killed)

                                                                                                                                                                !  &nbs p;                                            

Set population model options: set, Assume that samples result from pooled sequencing: ticked, Default ploidy for the analysis: 10 ŕ fails (killed)

 

Set population model options: set, Assume that samples result from pooled sequencing: ticked, Default ploidy for the analysis: 10 ŕ fails (killed)

 

It seems that it is the ploidy part that I am doing wrong, as it works with pooled data but ploidy of 2. I’m sure I have to change the ploidy though, or else how does the program know how many subjects are in the pool? Also, everywhere that I’ve ready says you have to change the ploidy.

 

I apologise if my question is naive. As I said, I am new to Galaxy and this is the first thing I am trying to do!

 

Any help / suggestions would be appreciated,

 

Thanks,

Nic

 



___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/