From commits-noreply@bitbucket.org Mon May 16 10:04:44 2011 From: Bitbucket To: galaxy-commits@lists.galaxyproject.org Subject: [galaxy-commits] commit/galaxy-central: 2 new changesets Date: Mon, 16 May 2011 14:04:38 +0000 Message-ID: <20110516140438.20360.51285@bitbucket03.managed.contegix.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8860946898754127966==" --===============8860946898754127966== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable 2 new changesets in galaxy-central: http://bitbucket.org/galaxy/galaxy-central/changeset/0d6acef66750/ changeset: r5565:0d6acef66750 user: fubar date: 2011-05-16 15:12:19 summary: repairs to rgfakePed tool for generating null genotype data - si= nce removing the _code.py trick of renaming composite file components means a= ll of the parts (eg map and ped) acquire the default name of RgeneticsData, t= hat's what the tool has to create - rather than sensibly named components bas= ed on the title - since only way to fix this wart is to reintroduce a bigger = wart using the post execution hook, decided to just go with that default base= _name to make things more sustanable in the long term affected #: 3 files (594 bytes) --- a/test-data/rgtestouts/rgfakePed/rgfakePedtest1.lped Fri May 13 21:24:03 = 2011 -0400 +++ b/test-data/rgtestouts/rgfakePed/rgfakePedtest1.lped Mon May 16 09:12:19 = 2011 -0400 @@ -9,8 +9,8 @@
-
  • rgfakePedtest1.ped
  • -
  • rgfakePedtest1.map
  • -

    This is simulated null genotype data generated by Rgenetics!

    rgf= akePed.py called with command line:
    /share/shared/galaxy/tools/rgenet=
    ics/rgfakePed.py --title rgfakePedtest1 -o /share/shared/galaxy/test-data/rgt=
    estouts/rgfakePed/rgfakePedtest1.lped -p /share/shared/galaxy/test-data/rgtes=
    touts/rgfakePed -c 20 -n 40 -s 10 -w 0 -v 0 -l pbed -d T -m 0 -M 0
    +
  • RgeneticsData.map
  • +
  • RgeneticsData.ped
  • +

    This is simulated null genotype data generated by Rgenetics!

    rgf= akePed.py called with command line:
    /udd/rerla/galaxy-central/tools/r=
    genetics/rgfakePed.py --title rgfakePedtest1 -o /export/tmp/tmpy7a643/databas=
    e/files/000/dataset_1.dat -p /udd/rerla/galaxy-central/database/job_working_d=
    irectory/1/dataset_1_files -c 20 -n 40 -s 10 -w 0.0 -v 0 -l L -d T -m 0.0 -M =
    0.0
     
    \ No newline at end of file --- a/tools/rgenetics/rgfakePed.py Fri May 13 21:24:03 2011 -0400 +++ b/tools/rgenetics/rgfakePed.py Mon May 16 09:12:19 2011 -0400 @@ -1,103 +1,103 @@ -#! /usr/local/bin/python2.4 -# pedigree data faker -# specifically designed for scalability testing of -# Shaun Purcel's PLINK package -# derived from John Ziniti's original suggestion -# allele frequency spectrum and random mating added -# ross lazarus me fecit january 13 2007 -# copyright ross lazarus 2007 -# without psyco -# generates about 10k snp genotypes in 2k subjects (666 trios) per minute or= so. -# so 500k (a billion genotypes), at about 4 trios/min will a couple of hours= to generate -# psyco makes it literally twice as quick!! -# all rights reserved except as granted under the terms of the LGPL -# see http://www.gnu.org/licenses/lgpl.html=20 -# for a copy of the license you receive with this software -# and for your rights and obligations -# especially if you wish to modify or redistribute this code -# january 19 added random missingness inducer -# currently about 15M genos/minute without psyco, 30M/minute with -# so a billion genos should take about 40 minutes with psyco or 80 without... -# added mendel error generator jan 23 rml - - +#! /usr/local/bin/python2.4 +# pedigree data faker +# specifically designed for scalability testing of +# Shaun Purcel's PLINK package +# derived from John Ziniti's original suggestion +# allele frequency spectrum and random mating added +# ross lazarus me fecit january 13 2007 +# copyright ross lazarus 2007 +# without psyco +# generates about 10k snp genotypes in 2k subjects (666 trios) per minute or= so. +# so 500k (a billion genotypes), at about 4 trios/min will a couple of hours= to generate +# psyco makes it literally twice as quick!! +# all rights reserved except as granted under the terms of the LGPL +# see http://www.gnu.org/licenses/lgpl.html=20 +# for a copy of the license you receive with this software +# and for your rights and obligations +# especially if you wish to modify or redistribute this code +# january 19 added random missingness inducer +# currently about 15M genos/minute without psyco, 30M/minute with +# so a billion genos should take about 40 minutes with psyco or 80 without... +# added mendel error generator jan 23 rml + + import random,sys,time,os,string - -from optparse import OptionParser - - =20 -width =3D 500000 -ALLELES =3D ['1','2','3','4'] -prog =3D os.path.split(sys.argv[0])[-1] -debug =3D 0 - -"""Natural-order sorting, supporting embedded numbers. -# found at http://lists.canonical.org/pipermail/kragen-hacks/2005-October/00= 0419.html -note test code there removed to conserve brain space -foo9bar2 < foo10bar2 < foo10bar10 - -""" -import random, re, sys - -def natsort_key(item):=20 - chunks =3D re.split('(\d+(?:\.\d+)?)', item) - for ii in range(len(chunks)): - if chunks[ii] and chunks[ii][0] in '0123456789': - if '.' in chunks[ii]: numtype =3D float - else: numtype =3D int - # wrap in tuple with '0' to explicitly specify numbers come first - chunks[ii] =3D (0, numtype(chunks[ii])) - else: - chunks[ii] =3D (1, chunks[ii]) - return (chunks, item) - -def natsort(seq): - "Sort a sequence of text strings in a reasonable order." - alist =3D [item for item in seq] - alist.sort(key=3Dnatsort_key) - return alist - - -def makeUniformMAFdist(low=3D0.02, high=3D0.5): - """Fake a non-uniform maf distribution to make the data - more interesting. Provide uniform 0.02-0.5 distribution""" - MAFdistribution =3D [] - for i in xrange(int(100*low),int(100*high)+1): - freq =3D i/100.0 # uniform - MAFdistribution.append(freq) - return MAFdistribution - -def makeTriangularMAFdist(low=3D0.02, high=3D0.5, beta=3D5): - """Fake a non-uniform maf distribution to make the data - more interesting - more rare alleles """ - MAFdistribution =3D [] - for i in xrange(int(100*low),int(100*high)+1): - freq =3D (51 - i)/100.0 # large numbers of small allele freqs - for j in range(beta*i): # or i*i for crude exponential distribution=20 - MAFdistribution.append(freq) - return MAFdistribution - -def makeFbathead(rslist=3D[], chromlist=3D[], poslist=3D[], width=3D100000): - """header row - """ - res =3D ['%s_%s_%s' % (chromlist[x], poslist[x], rslist[x]) for x in ran= ge(len(rslist))] - return ' '.join(res) - -def makeMap( width=3D500000, MAFdistribution=3D[], useGP=3DFalse): - """make snp allele and frequency tables for consistent generation""" - usegp =3D 1 - snpdb =3D 'snp126' - hgdb =3D 'hg18' - alleles =3D [] - freqs =3D [] - rslist =3D [] - chromlist =3D [] - poslist =3D [] - for snp in range(width): - random.shuffle(ALLELES) - alleles.append(ALLELES[0:2]) # need two DIFFERENT alleles! + +from optparse import OptionParser + +defbasename=3D"RgeneticsData" =20 +width =3D 500000 +ALLELES =3D ['1','2','3','4'] +prog =3D os.path.split(sys.argv[0])[-1] +debug =3D 0 + +"""Natural-order sorting, supporting embedded numbers. +# found at http://lists.canonical.org/pipermail/kragen-hacks/2005-October/00= 0419.html +note test code there removed to conserve brain space +foo9bar2 < foo10bar2 < foo10bar10 + +""" +import random, re, sys + +def natsort_key(item):=20 + chunks =3D re.split('(\d+(?:\.\d+)?)', item) + for ii in range(len(chunks)): + if chunks[ii] and chunks[ii][0] in '0123456789': + if '.' in chunks[ii]: numtype =3D float + else: numtype =3D int + # wrap in tuple with '0' to explicitly specify numbers come first + chunks[ii] =3D (0, numtype(chunks[ii])) + else: + chunks[ii] =3D (1, chunks[ii]) + return (chunks, item) + +def natsort(seq): + "Sort a sequence of text strings in a reasonable order." + alist =3D [item for item in seq] + alist.sort(key=3Dnatsort_key) + return alist + + +def makeUniformMAFdist(low=3D0.02, high=3D0.5): + """Fake a non-uniform maf distribution to make the data + more interesting. Provide uniform 0.02-0.5 distribution""" + MAFdistribution =3D [] + for i in xrange(int(100*low),int(100*high)+1): + freq =3D i/100.0 # uniform + MAFdistribution.append(freq) + return MAFdistribution + +def makeTriangularMAFdist(low=3D0.02, high=3D0.5, beta=3D5): + """Fake a non-uniform maf distribution to make the data + more interesting - more rare alleles """ + MAFdistribution =3D [] + for i in xrange(int(100*low),int(100*high)+1): + freq =3D (51 - i)/100.0 # large numbers of small allele freqs + for j in range(beta*i): # or i*i for crude exponential distribution=20 + MAFdistribution.append(freq) + return MAFdistribution + +def makeFbathead(rslist=3D[], chromlist=3D[], poslist=3D[], width=3D100000): + """header row + """ + res =3D ['%s_%s_%s' % (chromlist[x], poslist[x], rslist[x]) for x in ran= ge(len(rslist))] + return ' '.join(res) + +def makeMap( width=3D500000, MAFdistribution=3D[], useGP=3DFalse): + """make snp allele and frequency tables for consistent generation""" + usegp =3D 1 + snpdb =3D 'snp126' + hgdb =3D 'hg18' + alleles =3D [] + freqs =3D [] + rslist =3D [] + chromlist =3D [] + poslist =3D [] + for snp in range(width): + random.shuffle(ALLELES) + alleles.append(ALLELES[0:2]) # need two DIFFERENT alleles! freqs.append(random.choice(MAFdistribution)) # more rare alleles - if useGP: + if useGP: try: import MySQLdb genome =3D MySQLdb.Connect('localhost', 'hg18', 'G3gn0m3') @@ -106,402 +106,402 @@ if debug: print 'cannot connect to local copy of golden path' usegp =3D 0 - if usegp and useGP: # urrrgghh getting snps into chrom offset order is c= omplicated.... - curs.execute('use %s' % hgdb) - print 'Collecting %d real rs numbers - this may take a while' % width - # get a random draw of enough reasonable (hapmap) snps with frequenc= y data - s =3D '''select distinct chrom,chromEnd, name from %s where avHet > = 0 and chrom not like '%%random' - group by name order by rand() limit %d''' % (snpdb,width) - curs.execute(s) - reslist =3D curs.fetchall() - reslist =3D ['%s\t%09d\t%s' % (x[3:],y,z) for x,y,z in reslist] # ge= t rid of chr - reslist =3D natsort(reslist) - for s in reslist: - chrom,pos,rs =3D s.split('\t') - rslist.append(rs) - chromlist.append(chrom) - poslist.append(pos) - else: - chrom =3D '1' - for snp in range(width): - pos =3D '%d' % (1000*snp) - rs =3D 'rs00%d' % snp - rslist.append(rs) - chromlist.append(chrom) - poslist.append(pos) - return alleles,freqs, rslist, chromlist, poslist - -def writeMap(fprefix =3D '', fpath=3D'./', rslist=3D[], chromlist=3D[], posl= ist=3D[], width =3D 500000): - """make a faked plink compatible map file - fbat files + if usegp and useGP: # urrrgghh getting snps into chrom offset order is c= omplicated.... + curs.execute('use %s' % hgdb) + print 'Collecting %d real rs numbers - this may take a while' % width + # get a random draw of enough reasonable (hapmap) snps with frequenc= y data + s =3D '''select distinct chrom,chromEnd, name from %s where avHet > = 0 and chrom not like '%%random' + group by name order by rand() limit %d''' % (snpdb,width) + curs.execute(s) + reslist =3D curs.fetchall() + reslist =3D ['%s\t%09d\t%s' % (x[3:],y,z) for x,y,z in reslist] # ge= t rid of chr + reslist =3D natsort(reslist) + for s in reslist: + chrom,pos,rs =3D s.split('\t') + rslist.append(rs) + chromlist.append(chrom) + poslist.append(pos) + else: + chrom =3D '1' + for snp in range(width): + pos =3D '%d' % (1000*snp) + rs =3D 'rs00%d' % snp + rslist.append(rs) + chromlist.append(chrom) + poslist.append(pos) + return alleles,freqs, rslist, chromlist, poslist + +def writeMap(fprefix =3D '', fpath=3D'./', rslist=3D[], chromlist=3D[], posl= ist=3D[], width =3D 500000): + """make a faked plink compatible map file - fbat files have the map written as a header line""" outf =3D '%s.map'% (fprefix) - outf =3D os.path.join(fpath,outf) - amap =3D open(outf, 'w') - res =3D ['%s\t%s\t0\t%s' % (chromlist[x],rslist[x],poslist[x]) for x in = range(len(rslist))] - res.append('') - amap.write('\n'.join(res)) - amap.close() - -def makeMissing(genos=3D[], missrate =3D 0.03, missval =3D '0'): - """impose some random missingness""" - nsnps =3D len(genos) - for snp in range(nsnps): # ignore first 6 columns - if random.random() <=3D missrate: - genos[snp] =3D '%s %s' % (missval,missval) - return genos - -def makeTriomissing(genos=3D[], missrate =3D 0.03, missval =3D '0'): - """impose some random missingness on a trio - moth eaten like real data"= "" - for person in (0,1): - nsnps =3D len(genos[person]) - for snp in range(nsnps): - for person in [0,1,2]: - if random.random() <=3D missrate: - genos[person][snp] =3D '%s %s' % (missval,missval) - return genos - - -def makeTriomendel(p1g=3D(0,0),p2g=3D(0,0), kiddip =3D (0,0)): - """impose some random mendels on a trio - there are 8 of the 9 mating types we can simulate reasonable errors for - Note, since random mating dual het parents can produce any genotype we c= an't generate an interesting - error for them, so the overall mendel rate will be lower than mendrate, = depending on - allele frequency...""" - if p1g[0] <> p1g[1] and p2g[0] <> p2g[1]: # both parents het - return kiddip # cannot simulate a mendel error - anything is leg= al! - elif (p1g[0] <> p1g[1]): # p1 is het parent so p2 must be hom - if p2g[0] =3D=3D 0: # - make child p2 opposite hom for error - kiddip =3D (1,1) - else: - kiddip =3D (0,0) - elif (p2g[0] <> p2g[1]): # p2 is het parent so p1 must be hom - if p1g[0] =3D=3D 0: # - make child p1 opposite hom for error - kiddip =3D (1,1) - else: - kiddip =3D (0,0) - elif (p1g[0] =3D=3D p1g[1]): # p1 is hom parent and if we get here p2 mu= st also be hom - if p1g[0] =3D=3D p2g[0]: # both parents are same hom - make child ei= ther het or opposite hom for error - if random.random() <=3D 0.5: - kiddip =3D (0,1) - else: - if p1g[0] =3D=3D 0: - kiddip =3D (1,1) - else: - kiddip =3D (0,0) - else: # parents are opposite hom - return any hom as an error - if random.random() <=3D 0.5: - kiddip =3D (0,0) - else: - kiddip =3D (1,1) - return kiddip - =20 - =20 - - -def makeFam(width=3D100, freqs=3D{}, alleles=3D{}, trio=3D1, missrate=3D0.03= , missval=3D'0', mendrate=3D0.0): - """this family is a simple trio, constructed by random mating two random= genotypes - TODO: why not generate from chromosomes - eg hapmap - set each haplotype locus according to the conditional - probability implied by the surrounding loci - eg use both neighboring pa= irs, triplets - and quads as observed in hapmap ceu""" - dadped =3D '%d 1 0 0 1 1 %s' - mumped =3D '%d 2 0 0 2 1 %s' # a mother is a mum where I come from :) - kidped =3D '%d 3 1 2 %d %d %s' - family =3D [] # result accumulator - sex =3D random.choice((1,2)) # for the kid - affected =3D random.choice((1,2)) - genos =3D [[],[],[]] # dad, mum, kid - 0/1 for common,rare initially, th= en xform to alleles - # parent1...kidn lists of 0/1 for common,rare initially, then xformed to= alleles - for snp in xrange(width): - f =3D freqs[snp] =20 - for i in range(2): # do dad and mum - p =3D random.random() - a1 =3D a2 =3D 0 - if p <=3D f: # a rare allele - a1 =3D 1 - p =3D random.random() - if p <=3D f: # a rare allele - a2 =3D 1 - if a1 > a2: - a1,a2 =3D a2,a1 # so ordering consistent - 00,01,11 - dip =3D (a1,a2) - genos[i].append(dip) # tuples of 0,1 - a1 =3D random.choice(genos[0][snp]) # dad gamete =20 - a2 =3D random.choice(genos[1][snp]) # mum gamete - if a1 > a2: - a1,a2 =3D a2,a1 # so ordering consistent - 00,01,11 - kiddip =3D (a1,a2) # NSFW mating! - genos[2].append(kiddip) - if mendrate > 0: - if random.random() <=3D mendrate: - genos[2][snp] =3D makeTriomendel(genos[0][snp],genos[1][snp]= , kiddip) - achoice =3D alleles[snp] - for g in genos: # now convert to alleles using allele dict - a1 =3D achoice[g[snp][0]] # get allele letter - a2 =3D achoice[g[snp][1]] =20 - g[snp] =3D '%s %s' % (a1,a2) - if missrate > 0: - genos =3D makeTriomissing(genos=3Dgenos,missrate=3Dmissrate, missval= =3Dmissval) - family.append(dadped % (trio,' '.join(genos[0]))) # create a row for eac= h member of trio - family.append(mumped % (trio,' '.join(genos[1]))) - family.append(kidped % (trio,sex,affected,' '.join(genos[2]))) - return family - -def makePerson(width=3D100, aff=3D1, freqs=3D{}, alleles=3D{}, id=3D1, missr= ate =3D 0.03, missval=3D'0'): - """make an entire genotype vector for an independent subject""" - sex =3D random.choice((1,2)) - if not aff: - aff =3D random.choice((1,2)) - genos =3D [] #0/1 for common,rare initially, then xform to alleles - family =3D [] - personped =3D '%d 1 0 0 %d %d %s' - poly =3D (0,1) - for snp in xrange(width): - achoice =3D alleles[snp] - f =3D freqs[snp] - p =3D random.random() - a1 =3D a2 =3D 0 - if p <=3D f: # a rare allele - a1 =3D 1 - p =3D random.random() - if p <=3D f: # a rare allele - a2 =3D 1 - if a1 > a2: - a1,a2 =3D a2,a1 # so ordering consistent - 00,01,11 - a1 =3D achoice[a1] # get allele letter - a2 =3D achoice[a2] - g =3D '%s %s' % (a1,a2) - genos.append(g) - if missrate > 0.0: - genos =3D makeMissing(genos=3Dgenos,missrate=3Dmissrate, missval=3Dm= issval) - family.append(personped % (id,sex,aff,' '.join(genos))) - return family - -def makeHapmap(fprefix=3D 'fakebigped',width=3D100, aff=3D[], freqs=3D{}, - alleles=3D{}, nsubj =3D 2000, trios =3D True, mendrate=3D0.03= , missrate =3D 0.03, missval=3D'0'): - """ fake a hapmap file and a pedigree file for eg haploview - this is arranged as the transpose of a ped file - cols are subjects, row= s are markers - so we need to generate differently since we can't do the transpose in ra= m reliably for - a few billion genotypes... - """ - outheadprefix =3D 'rs# alleles chrom pos strand assembly# center protLSI= D assayLSID panelLSID QCcode %s' - cfake5 =3D ["illumina","urn:LSID:illumina.hapmap.org:Protocol:Golden_Gat= e_1.0.0:1",=20 -"urn:LSID:illumina.hapmap.org:Assay:27741:1","urn:lsid:dcc.hapmap.org:Panel:= CEPH-30-trios:1","QC+"] - yfake5 =3D ["illumina","urn:LSID:illumina.hapmap.org:Protocol:Golden_Gat= e_1.0.0:1",=20 -"urn:LSID:illumina.hapmap.org:Assay:27741:1","urn:LSID:dcc.hapmap.org:Panel:= Yoruba-30-trios:1","QC+"] - sampids =3D ids - if trios: - ts =3D '%d trios' % int(nsubj/3.) - else: - ts =3D '%d unrelated subjects' % nsubj - res =3D ['#%s fake hapmap file %d snps and %s, faked by %s' % (timenow()= , width, ts, prog),] - res.append('# ross lazarus me fecit') - res.append(outheadprefix % ' '.join(sampids)) # make a header compatible= with hapmap extracts - outf =3D open('%s.hmap' % (fprefix), 'w') - started =3D time.time() - if trios: - ntrios =3D int(nsubj/3.) - for n in ntrios: # each is a dict - row =3D copy.copy(cfake5) # get first fields - row =3D map(str,row) - if race =3D=3D "YRI": - row +=3D yfake5 - elif race =3D=3D 'CEU': - row +=3D cfake5 - else: - row +=3D ['NA' for x in range(5)] # 5 dummy fields =3D cente= r protLSID assayLSID panelLSID QCcode - row +=3D [''.join(sorted(line[x])) for x in sampids] # the genot= ypes in header (sorted) sample id order - res.append(' '.join(row)) - res.append('') - outfname =3D '%s_%s_%s_%dkb.geno' % (gene,probeid,race,2*flank/1000) - f =3D file(outfname,'w') - f.write('\n'.join(res)) - f.close() - print '### %s: Wrote %d lines to %s' % (timenow(), len(res),outfname) - =20 - + outf =3D os.path.join(fpath,outf) + amap =3D open(outf, 'w') + res =3D ['%s\t%s\t0\t%s' % (chromlist[x],rslist[x],poslist[x]) for x in = range(len(rslist))] + res.append('') + amap.write('\n'.join(res)) + amap.close() + +def makeMissing(genos=3D[], missrate =3D 0.03, missval =3D '0'): + """impose some random missingness""" + nsnps =3D len(genos) + for snp in range(nsnps): # ignore first 6 columns + if random.random() <=3D missrate: + genos[snp] =3D '%s %s' % (missval,missval) + return genos + +def makeTriomissing(genos=3D[], missrate =3D 0.03, missval =3D '0'): + """impose some random missingness on a trio - moth eaten like real data"= "" + for person in (0,1): + nsnps =3D len(genos[person]) + for snp in range(nsnps): + for person in [0,1,2]: + if random.random() <=3D missrate: + genos[person][snp] =3D '%s %s' % (missval,missval) + return genos + + +def makeTriomendel(p1g=3D(0,0),p2g=3D(0,0), kiddip =3D (0,0)): + """impose some random mendels on a trio + there are 8 of the 9 mating types we can simulate reasonable errors for + Note, since random mating dual het parents can produce any genotype we c= an't generate an interesting + error for them, so the overall mendel rate will be lower than mendrate, = depending on + allele frequency...""" + if p1g[0] <> p1g[1] and p2g[0] <> p2g[1]: # both parents het + return kiddip # cannot simulate a mendel error - anything is leg= al! + elif (p1g[0] <> p1g[1]): # p1 is het parent so p2 must be hom + if p2g[0] =3D=3D 0: # - make child p2 opposite hom for error + kiddip =3D (1,1) + else: + kiddip =3D (0,0) + elif (p2g[0] <> p2g[1]): # p2 is het parent so p1 must be hom + if p1g[0] =3D=3D 0: # - make child p1 opposite hom for error + kiddip =3D (1,1) + else: + kiddip =3D (0,0) + elif (p1g[0] =3D=3D p1g[1]): # p1 is hom parent and if we get here p2 mu= st also be hom + if p1g[0] =3D=3D p2g[0]: # both parents are same hom - make child ei= ther het or opposite hom for error + if random.random() <=3D 0.5: + kiddip =3D (0,1) + else: + if p1g[0] =3D=3D 0: + kiddip =3D (1,1) + else: + kiddip =3D (0,0) + else: # parents are opposite hom - return any hom as an error + if random.random() <=3D 0.5: + kiddip =3D (0,0) + else: + kiddip =3D (1,1) + return kiddip + =20 + =20 + + +def makeFam(width=3D100, freqs=3D{}, alleles=3D{}, trio=3D1, missrate=3D0.03= , missval=3D'0', mendrate=3D0.0): + """this family is a simple trio, constructed by random mating two random= genotypes + TODO: why not generate from chromosomes - eg hapmap + set each haplotype locus according to the conditional + probability implied by the surrounding loci - eg use both neighboring pa= irs, triplets + and quads as observed in hapmap ceu""" + dadped =3D '%d 1 0 0 1 1 %s' + mumped =3D '%d 2 0 0 2 1 %s' # a mother is a mum where I come from :) + kidped =3D '%d 3 1 2 %d %d %s' + family =3D [] # result accumulator + sex =3D random.choice((1,2)) # for the kid + affected =3D random.choice((1,2)) + genos =3D [[],[],[]] # dad, mum, kid - 0/1 for common,rare initially, th= en xform to alleles + # parent1...kidn lists of 0/1 for common,rare initially, then xformed to= alleles + for snp in xrange(width): + f =3D freqs[snp] =20 + for i in range(2): # do dad and mum + p =3D random.random() + a1 =3D a2 =3D 0 + if p <=3D f: # a rare allele + a1 =3D 1 + p =3D random.random() + if p <=3D f: # a rare allele + a2 =3D 1 + if a1 > a2: + a1,a2 =3D a2,a1 # so ordering consistent - 00,01,11 + dip =3D (a1,a2) + genos[i].append(dip) # tuples of 0,1 + a1 =3D random.choice(genos[0][snp]) # dad gamete =20 + a2 =3D random.choice(genos[1][snp]) # mum gamete + if a1 > a2: + a1,a2 =3D a2,a1 # so ordering consistent - 00,01,11 + kiddip =3D (a1,a2) # NSFW mating! + genos[2].append(kiddip) + if mendrate > 0: + if random.random() <=3D mendrate: + genos[2][snp] =3D makeTriomendel(genos[0][snp],genos[1][snp]= , kiddip) + achoice =3D alleles[snp] + for g in genos: # now convert to alleles using allele dict + a1 =3D achoice[g[snp][0]] # get allele letter + a2 =3D achoice[g[snp][1]] =20 + g[snp] =3D '%s %s' % (a1,a2) + if missrate > 0: + genos =3D makeTriomissing(genos=3Dgenos,missrate=3Dmissrate, missval= =3Dmissval) + family.append(dadped % (trio,' '.join(genos[0]))) # create a row for eac= h member of trio + family.append(mumped % (trio,' '.join(genos[1]))) + family.append(kidped % (trio,sex,affected,' '.join(genos[2]))) + return family + +def makePerson(width=3D100, aff=3D1, freqs=3D{}, alleles=3D{}, id=3D1, missr= ate =3D 0.03, missval=3D'0'): + """make an entire genotype vector for an independent subject""" + sex =3D random.choice((1,2)) + if not aff: + aff =3D random.choice((1,2)) + genos =3D [] #0/1 for common,rare initially, then xform to alleles + family =3D [] + personped =3D '%d 1 0 0 %d %d %s' + poly =3D (0,1) + for snp in xrange(width): + achoice =3D alleles[snp] + f =3D freqs[snp] + p =3D random.random() + a1 =3D a2 =3D 0 + if p <=3D f: # a rare allele + a1 =3D 1 + p =3D random.random() + if p <=3D f: # a rare allele + a2 =3D 1 + if a1 > a2: + a1,a2 =3D a2,a1 # so ordering consistent - 00,01,11 + a1 =3D achoice[a1] # get allele letter + a2 =3D achoice[a2] + g =3D '%s %s' % (a1,a2) + genos.append(g) + if missrate > 0.0: + genos =3D makeMissing(genos=3Dgenos,missrate=3Dmissrate, missval=3Dm= issval) + family.append(personped % (id,sex,aff,' '.join(genos))) + return family + +def makeHapmap(fprefix=3D 'fakebigped',width=3D100, aff=3D[], freqs=3D{}, + alleles=3D{}, nsubj =3D 2000, trios =3D True, mendrate=3D0.03= , missrate =3D 0.03, missval=3D'0'): + """ fake a hapmap file and a pedigree file for eg haploview + this is arranged as the transpose of a ped file - cols are subjects, row= s are markers + so we need to generate differently since we can't do the transpose in ra= m reliably for + a few billion genotypes... + """ + outheadprefix =3D 'rs# alleles chrom pos strand assembly# center protLSI= D assayLSID panelLSID QCcode %s' + cfake5 =3D ["illumina","urn:LSID:illumina.hapmap.org:Protocol:Golden_Gat= e_1.0.0:1",=20 +"urn:LSID:illumina.hapmap.org:Assay:27741:1","urn:lsid:dcc.hapmap.org:Panel:= CEPH-30-trios:1","QC+"] + yfake5 =3D ["illumina","urn:LSID:illumina.hapmap.org:Protocol:Golden_Gat= e_1.0.0:1",=20 +"urn:LSID:illumina.hapmap.org:Assay:27741:1","urn:LSID:dcc.hapmap.org:Panel:= Yoruba-30-trios:1","QC+"] + sampids =3D ids + if trios: + ts =3D '%d trios' % int(nsubj/3.) + else: + ts =3D '%d unrelated subjects' % nsubj + res =3D ['#%s fake hapmap file %d snps and %s, faked by %s' % (timenow()= , width, ts, prog),] + res.append('# ross lazarus me fecit') + res.append(outheadprefix % ' '.join(sampids)) # make a header compatible= with hapmap extracts + outf =3D open('%s.hmap' % (fprefix), 'w') + started =3D time.time() + if trios: + ntrios =3D int(nsubj/3.) + for n in ntrios: # each is a dict + row =3D copy.copy(cfake5) # get first fields + row =3D map(str,row) + if race =3D=3D "YRI": + row +=3D yfake5 + elif race =3D=3D 'CEU': + row +=3D cfake5 + else: + row +=3D ['NA' for x in range(5)] # 5 dummy fields =3D cente= r protLSID assayLSID panelLSID QCcode + row +=3D [''.join(sorted(line[x])) for x in sampids] # the genot= ypes in header (sorted) sample id order + res.append(' '.join(row)) + res.append('') + outfname =3D '%s_%s_%s_%dkb.geno' % (gene,probeid,race,2*flank/1000) + f =3D file(outfname,'w') + f.write('\n'.join(res)) + f.close() + print '### %s: Wrote %d lines to %s' % (timenow(), len(res),outfname) + =20 + def makePed(fprefix=3D 'fakebigped', fpath=3D'./', - width=3D500000, nsubj=3D2000, MAFdistribution=3D[],alleles=3D{}, - freqs=3D{}, fbatstyle=3DTrue, mendrate =3D 0.0, missrate =3D 0.0= 3, missval=3D'0',fbathead=3D''): - """fake trios with mendel consistent random mating genotypes in offspring - with consistent alleles and MAFs for the sample""" - res =3D [] - if fbatstyle: # add a header row with the marker names + width=3D500000, nsubj=3D2000, MAFdistribution=3D[],alleles=3D{}, + freqs=3D{}, fbatstyle=3DTrue, mendrate =3D 0.0, missrate =3D 0.0= 3, missval=3D'0',fbathead=3D''): + """fake trios with mendel consistent random mating genotypes in offspring + with consistent alleles and MAFs for the sample""" + res =3D [] + if fbatstyle: # add a header row with the marker names res.append(fbathead) # header row for fbat outfname =3D '%s.ped'% (fprefix) outfname =3D os.path.join(fpath,outfname) - outf =3D open(outfname,'w') - ntrios =3D int(nsubj/3.) - outf =3D open(outfile, 'w') - started =3D time.time() - for trio in xrange(ntrios): - family =3D makeFam(width=3Dwidth, freqs=3Dfreqs, alleles=3Dalleles, = trio=3Dtrio, - missrate =3D missrate, mendrate=3Dmendrate, missval= =3Dmissval) - res +=3D family - if (trio + 1) % 10 =3D=3D 0: # write out to keep ram requirements re= asonable - if (trio + 1) % 50 =3D=3D 0: # show progress - dur =3D time.time() - started - if dur =3D=3D 0: - dur =3D 1.0 - print 'Trio: %d, %4.1f genos/sec at %6.1f sec' % (trio + 1, = width*trio*3/dur, dur) - outf.write('\n'.join(res)) - outf.write('\n') - res =3D [] - if len(res) > 0: # some left - outf.write('\n'.join(res)) - outf.write('\n') - outf.close() + outf =3D open(outfname,'w') + ntrios =3D int(nsubj/3.) + outf =3D open(outfile, 'w') + started =3D time.time() + for trio in xrange(ntrios): + family =3D makeFam(width=3Dwidth, freqs=3Dfreqs, alleles=3Dalleles, = trio=3Dtrio, + missrate =3D missrate, mendrate=3Dmendrate, missval= =3Dmissval) + res +=3D family + if (trio + 1) % 10 =3D=3D 0: # write out to keep ram requirements re= asonable + if (trio + 1) % 50 =3D=3D 0: # show progress + dur =3D time.time() - started + if dur =3D=3D 0: + dur =3D 1.0 + print 'Trio: %d, %4.1f genos/sec at %6.1f sec' % (trio + 1, = width*trio*3/dur, dur) + outf.write('\n'.join(res)) + outf.write('\n') + res =3D [] + if len(res) > 0: # some left + outf.write('\n'.join(res)) + outf.write('\n') + outf.close() if debug: - print '##makeped : %6.1f seconds total runtime' % (time.time() - sta= rted) - + print '##makeped : %6.1f seconds total runtime' % (time.time() - sta= rted) + def makeIndep(fprefix =3D 'fakebigped', fpath=3D'./', - width=3D500000, Nunaff=3D1000, Naff=3D1000, MAFdistribution=3D= [], - alleles=3D{}, freqs=3D{}, fbatstyle=3DTrue, missrate =3D 0.03,= missval=3D'0',fbathead=3D''): - """fake a random sample from a random mating sample - with consistent alleles and MAFs""" - res =3D [] - Ntot =3D Nunaff + Naff - status =3D [1,]*Nunaff + width=3D500000, Nunaff=3D1000, Naff=3D1000, MAFdistribution=3D= [], + alleles=3D{}, freqs=3D{}, fbatstyle=3DTrue, missrate =3D 0.03,= missval=3D'0',fbathead=3D''): + """fake a random sample from a random mating sample + with consistent alleles and MAFs""" + res =3D [] + Ntot =3D Nunaff + Naff + status =3D [1,]*Nunaff status +=3D [2,]*Nunaff outf =3D '%s.ped' % (fprefix) - outf =3D os.path.join(fpath,outf) - outf =3D open(outf, 'w') - started =3D time.time() - #sample =3D personMaker(width=3Dwidth, affs=3Dstatus, freqs=3Dfreqs, all= eles=3Dalleles, Ntomake=3DNtot) - if fbatstyle: # add a header row with the marker names - res.append(fbathead) # header row for fbat - for id in xrange(Ntot): - if id < Nunaff: - aff =3D 1 - else: - aff =3D 2 - family =3D makePerson(width=3Dwidth, aff=3Daff, freqs=3Dfreqs, allel= es=3Dalleles, id=3Did+1) - res +=3D family - if (id % 50 =3D=3D 0): # write out to keep ram requirements reasonab= le - if (id % 200 =3D=3D 0): # show progress - dur =3D time.time() - started - if dur =3D=3D 0: - dur =3D 1.0 - print 'Id: %d, %4.1f genos/sec at %6.1f sec' % (id, width*id= /dur, dur) - outf.write('\n'.join(res)) - outf.write('\n') - res =3D [] - if len(res) > 0: # some left - outf.write('\n'.join(res)) - outf.write('\n') - outf.close() - print '## makeindep: %6.1f seconds total runtime' % (time.time() - start= ed) - -u =3D """ -Generate either trios or independent subjects with a prespecified -number of random alleles and a uniform or triangular MAF distribution for -stress testing. No LD is simulated - alleles are random. Offspring for -trios are generated by random mating the random parental alleles so there are -no Mendelian errors unless the -M option is used. Mendelian errors are gener= ated -randomly according to the possible errors given the parental mating type alt= hough -this is fresh code and not guaranteed to work quite right yet - comments wel= comed - -Enquiries to ross.lazarus(a)gmail.com - -eg to generate 700 trios with 500k snps, use: -fakebigped.py -n 2100 -s 500000 -or to generate 500 independent cases and 500 controls with 100k snps and 0.0= 2 missingness (MCAR), use: -fakebigped.py -c 500 -n 1000 -s 100000 -m 0.02 - -fakebigped.py -o myfake -m 0.05 -s 100000 -n 2000 -will make fbat compatible myfake.ped with 100k markers in -666 trios (total close to 2000 subjects), a uniform MAF distribution and abo= ut 5% MCAR missing - -fakebigped.py -o myfake -m 0.05 -s 100000 -n 2000 -M 0.05 -will make fbat compatible myfake.ped with 100k markers in -666 trios (total close to 2000 subjects), a uniform MAF distribution, -about 5% Mendelian errors and about 5% MCAR missing - - -fakebigped.py -o myfakecc -m 0.05 -s 100000 -n 2000 -c 1000 -l -will make plink compatible myfakecc.ped and myfakecc.map (that's what the -l= option does), -with 100k markers in 1000 cases and 1000 controls (affection status 2 and 1 = respectively), -a triangular MAF distribution (more rare alleles) and about 5% MCAR missing - -You should see about 1/4 million genotypes/second so about an hour for a -500k snps in 2k subjects and about a 4GB ped file - these are BIG!! - -""" - -import sys, os, glob - -galhtmlprefix =3D """ - - - - - - - - - -
    -""" - - -def doImport(outfile=3DNone,outpath=3DNone): - """ import into one of the new html composite data types for Rgenetics - Dan Blankenberg with mods by Ross Lazarus=20 - October 2007 - """ - flist =3D glob.glob(os.path.join(outpath,'*')) - outf =3D open(outfile,'w') - outf.write(galhtmlprefix % prog) - for i, data in enumerate( flist ): - outf.write('
  • %s
  • \n' % (os.path.split(data)= [-1],os.path.split(data)[-1])) + outf =3D os.path.join(fpath,outf) + outf =3D open(outf, 'w') + started =3D time.time() + #sample =3D personMaker(width=3Dwidth, affs=3Dstatus, freqs=3Dfreqs, all= eles=3Dalleles, Ntomake=3DNtot) + if fbatstyle: # add a header row with the marker names + res.append(fbathead) # header row for fbat + for id in xrange(Ntot): + if id < Nunaff: + aff =3D 1 + else: + aff =3D 2 + family =3D makePerson(width=3Dwidth, aff=3Daff, freqs=3Dfreqs, allel= es=3Dalleles, id=3Did+1) + res +=3D family + if (id % 50 =3D=3D 0): # write out to keep ram requirements reasonab= le + if (id % 200 =3D=3D 0): # show progress + dur =3D time.time() - started + if dur =3D=3D 0: + dur =3D 1.0 + print 'Id: %d, %4.1f genos/sec at %6.1f sec' % (id, width*id= /dur, dur) + outf.write('\n'.join(res)) + outf.write('\n') + res =3D [] + if len(res) > 0: # some left + outf.write('\n'.join(res)) + outf.write('\n') + outf.close() + print '## makeindep: %6.1f seconds total runtime' % (time.time() - start= ed) + +u =3D """ +Generate either trios or independent subjects with a prespecified +number of random alleles and a uniform or triangular MAF distribution for +stress testing. No LD is simulated - alleles are random. Offspring for +trios are generated by random mating the random parental alleles so there are +no Mendelian errors unless the -M option is used. Mendelian errors are gener= ated +randomly according to the possible errors given the parental mating type alt= hough +this is fresh code and not guaranteed to work quite right yet - comments wel= comed + +Enquiries to ross.lazarus(a)gmail.com + +eg to generate 700 trios with 500k snps, use: +fakebigped.py -n 2100 -s 500000 +or to generate 500 independent cases and 500 controls with 100k snps and 0.0= 2 missingness (MCAR), use: +fakebigped.py -c 500 -n 1000 -s 100000 -m 0.02 + +fakebigped.py -o myfake -m 0.05 -s 100000 -n 2000 +will make fbat compatible myfake.ped with 100k markers in +666 trios (total close to 2000 subjects), a uniform MAF distribution and abo= ut 5% MCAR missing + +fakebigped.py -o myfake -m 0.05 -s 100000 -n 2000 -M 0.05 +will make fbat compatible myfake.ped with 100k markers in +666 trios (total close to 2000 subjects), a uniform MAF distribution, +about 5% Mendelian errors and about 5% MCAR missing + + +fakebigped.py -o myfakecc -m 0.05 -s 100000 -n 2000 -c 1000 -l +will make plink compatible myfakecc.ped and myfakecc.map (that's what the -l= option does), +with 100k markers in 1000 cases and 1000 controls (affection status 2 and 1 = respectively), +a triangular MAF distribution (more rare alleles) and about 5% MCAR missing + +You should see about 1/4 million genotypes/second so about an hour for a +500k snps in 2k subjects and about a 4GB ped file - these are BIG!! + +""" + +import sys, os, glob + +galhtmlprefix =3D """ + + + + + + + + + +
    +""" + + +def doImport(outfile=3DNone,outpath=3DNone): + """ import into one of the new html composite data types for Rgenetics + Dan Blankenberg with mods by Ross Lazarus=20 + October 2007 + """ + flist =3D glob.glob(os.path.join(outpath,'*')) + outf =3D open(outfile,'w') + outf.write(galhtmlprefix % prog) + for i, data in enumerate( flist ): + outf.write('
  • %s
  • \n' % (os.path.split(data)= [-1],os.path.split(data)[-1])) outf.write('

    This is simulated null genotype data generated by Rg= enetics!

    ') outf.write('%s called with command line:
    ' % prog)
         outf.write(' '.join(sys.argv))
    -    outf.write('\n
    \n') - outf.write("
    ") - outf.close() - - - -if __name__ =3D=3D "__main__": - """ - """ - parser =3D OptionParser(usage=3Du, version=3D"%prog 0.01") - a =3D parser.add_option - a("-n","--nsubjects",type=3D"int",dest=3D"Ntot", - help=3D"nsubj: total number of subjects",default=3D2000) - a("-t","--title",dest=3D"title", - help=3D"title: file basename for outputs",default=3D'fakeped') - a("-c","--cases",type=3D"int",dest=3D"Naff", - help=3D"number of cases: independent subjects with status set to 2 (ie= cases). If not set, NTOT/3 trios will be generated", default =3D 0) - a("-s","--snps",dest=3D"width",type=3D"int", - help=3D"snps: total number of snps per subject", default=3D1000) - a("-d","--distribution",dest=3D"MAFdist",default=3D"Uniform", - help=3D"MAF distribution - default is Uniform, can be Triangular") - a("-o","--outf",dest=3D"outf", - help=3D"Output file", default =3D 'fakeped') - a("-p","--outpath",dest=3D"outpath", - help=3D"Path for output files", default =3D './') - a("-l","--pLink",dest=3D"outstyle", default=3D'L', - help=3D"Ped files as for Plink - no header, separate Map file - defaul= t is Plink style") - a("-w","--loWmaf", type=3D"float", dest=3D"lowmaf", default=3D0.01, help= =3D"Lower limit for SNP MAF (minor allele freq)") - a("-m","--missing",dest=3D"missrate",type=3D"float", - help=3D"missing: probability of missing MCAR - default 0.0", default= =3D0.0) - a("-v","--valmiss",dest=3D"missval", - help=3D"missing character: Missing allele code - usually 0 or N - defa= ult 0", default=3D"0") - a("-M","--Mendelrate",dest=3D"mendrate",type=3D"float", - help=3D"Mendelian error rate: probability of a mendel error per trio, = default=3D0.0", default=3D0.0) =20 - a("-H","--noHGRS",dest=3D"useHG",type=3D"int", - help=3D"Use local copy of UCSC snp126 database to generate real rs num= bers", default=3DTrue) - (options,args) =3D parser.parse_args() - low =3D options.lowmaf + outf.write('\n\n') + outf.write("
    ") + outf.close() + + + +if __name__ =3D=3D "__main__": + """ + """ + parser =3D OptionParser(usage=3Du, version=3D"%prog 0.01") + a =3D parser.add_option + a("-n","--nsubjects",type=3D"int",dest=3D"Ntot", + help=3D"nsubj: total number of subjects",default=3D2000) + a("-t","--title",dest=3D"title", + help=3D"title: file basename for outputs",default=3D'fakeped') + a("-c","--cases",type=3D"int",dest=3D"Naff", + help=3D"number of cases: independent subjects with status set to 2 (ie= cases). If not set, NTOT/3 trios will be generated", default =3D 0) + a("-s","--snps",dest=3D"width",type=3D"int", + help=3D"snps: total number of snps per subject", default=3D1000) + a("-d","--distribution",dest=3D"MAFdist",default=3D"Uniform", + help=3D"MAF distribution - default is Uniform, can be Triangular") + a("-o","--outf",dest=3D"outf", + help=3D"Output file", default =3D 'fakeped') + a("-p","--outpath",dest=3D"outpath", + help=3D"Path for output files", default =3D './') + a("-l","--pLink",dest=3D"outstyle", default=3D'L', + help=3D"Ped files as for Plink - no header, separate Map file - defaul= t is Plink style") + a("-w","--loWmaf", type=3D"float", dest=3D"lowmaf", default=3D0.01, help= =3D"Lower limit for SNP MAF (minor allele freq)") + a("-m","--missing",dest=3D"missrate",type=3D"float", + help=3D"missing: probability of missing MCAR - default 0.0", default= =3D0.0) + a("-v","--valmiss",dest=3D"missval", + help=3D"missing character: Missing allele code - usually 0 or N - defa= ult 0", default=3D"0") + a("-M","--Mendelrate",dest=3D"mendrate",type=3D"float", + help=3D"Mendelian error rate: probability of a mendel error per trio, = default=3D0.0", default=3D0.0) =20 + a("-H","--noHGRS",dest=3D"useHG",type=3D"int", + help=3D"Use local copy of UCSC snp126 database to generate real rs num= bers", default=3DTrue) + (options,args) =3D parser.parse_args() + low =3D options.lowmaf try: os.makedirs(options.outpath) except: pass - if options.MAFdist.upper() =3D=3D 'U': - mafDist =3D makeUniformMAFdist(low=3Dlow, high=3D0.5) - else: + if options.MAFdist.upper() =3D=3D 'U': + mafDist =3D makeUniformMAFdist(low=3Dlow, high=3D0.5) + else: mafDist =3D makeTriangularMAFdist(low=3Dlow, high=3D0.5, beta=3D5) alleles,freqs, rslist, chromlist, poslist =3D makeMap(width=3Dint(option= s.width), MAFdistribution=3DmafDist, useGP=3DF= alse) @@ -511,25 +511,25 @@ title =3D string.translate(options.title,trantab) =20 if options.outstyle =3D=3D 'F': - fbatstyle =3D True - fbathead =3D makeFbathead(rslist=3Drslist, chromlist=3Dchromlist, po= slist=3Dposlist, width=3Doptions.width) + fbatstyle =3D True + fbathead =3D makeFbathead(rslist=3Drslist, chromlist=3Dchromlist, po= slist=3Dposlist, width=3Doptions.width) else: - fbatstyle =3D False - writeMap(fprefix=3Dtitle, rslist=3Drslist, fpath=3Doptions.outpath, - chromlist=3Dchromlist, poslist=3Dposlist, width=3Doptions.w= idth) - if options.Naff > 0: # make case control data - makeIndep(fprefix =3D title, fpath=3Doptions.outpath, - width=3Doptions.width, Nunaff=3Doptions.Ntot-options.Naff, - Naff=3Doptions.Naff, MAFdistribution=3DmafDist,alleles=3Da= lleles, freqs=3Dfreqs, - fbatstyle=3Dfbatstyle, missrate=3Doptions.missrate, missva= l=3Doptions.missval, - fbathead=3Dfbathead) - else: - makePed(fprefix=3Doptions.fprefix, fpath=3Doptions.fpath, - width=3Doptions.width, MAFdistribution=3DmafDist, nsubj=3Doption= s.Ntot, - alleles=3Dalleles, freqs=3Dfreqs, fbatstyle=3Dfbatstyle, missrat= e=3Doptions.missrate, - mendrate=3Doptions.mendrate, missval=3Doptions.missval, + fbatstyle =3D False + writeMap(fprefix=3Ddefbasename, rslist=3Drslist, fpath=3Doptions.out= path, + chromlist=3Dchromlist, poslist=3Dposlist, width=3Doptions.w= idth) + if options.Naff > 0: # make case control data + makeIndep(fprefix =3D defbasename, fpath=3Doptions.outpath, + width=3Doptions.width, Nunaff=3Doptions.Ntot-options.Naff, + Naff=3Doptions.Naff, MAFdistribution=3DmafDist,alleles=3Da= lleles, freqs=3Dfreqs, + fbatstyle=3Dfbatstyle, missrate=3Doptions.missrate, missva= l=3Doptions.missval, fbathead=3Dfbathead) - doImport(outfile=3Doptions.outf,outpath=3Doptions.outpath) - - - =20 + else: + makePed(fprefix=3Ddefbasename, fpath=3Doptions.fpath, + width=3Doptions.width, MAFdistribution=3DmafDist, nsubj=3Doption= s.Ntot, + alleles=3Dalleles, freqs=3Dfreqs, fbatstyle=3Dfbatstyle, missrat= e=3Doptions.missrate, + mendrate=3Doptions.mendrate, missval=3Doptions.missval, + fbathead=3Dfbathead) + doImport(outfile=3Doptions.outf,outpath=3Doptions.outpath) + + + =20 --- a/tools/rgenetics/rgfakePed.xml Fri May 13 21:24:03 2011 -0400 +++ b/tools/rgenetics/rgfakePed.xml Mon May 16 09:12:19 2011 -0400 @@ -1,14 +1,13 @@ - +for testingrgfakePed.py --tit= le '$title' -o '$out_file1' -p '$out_file1.files_path' -c '$ncases' -n '$ntotal' -s '$nsnp' -w '$lowmaf' -v '$missingValue' -l '$outFormat' -d '$mafdist' -m '$missingRate' -M '$mendelRate' - =20 - - + @@ -74,8 +72,8 @@ - - + + @@ -100,11 +98,15 @@ =20 This tool is very experimental =20 -**Attribution** +.. class:: infomark + +**Attribution and Licensing** + Designed and written for the Rgenetics Galaxy tools copyright Ross Lazarus 2007 (ross.lazarus(a)gmail.com) -Licensed under the terms of the LGPL -as documented http://www.gnu.org/licenses/lgpl.html +Licensed under the terms of the _LGPL +=20 + .. _LGPL: http://www.gnu.org/copyleft/lesser.html =20 http://bitbucket.org/galaxy/galaxy-central/changeset/2790f54a4fe7/ changeset: r5566:2790f54a4fe7 user: fubar date: 2011-05-16 16:03:47 summary: removed python2.4 from rgfakePed.py affected #: 1 file (161 bytes) --- a/tools/rgenetics/rgfakePed.py Mon May 16 09:12:19 2011 -0400 +++ b/tools/rgenetics/rgfakePed.py Mon May 16 10:03:47 2011 -0400 @@ -1,4 +1,6 @@ -#! /usr/local/bin/python2.4 +# modified may 2011 to name components (map/ped) as RgeneticsData to align w= ith default base_name +# otherwise downstream tools fail +# modified march 2011 to remove post execution hook =20 # pedigree data faker # specifically designed for scalability testing of # Shaun Purcel's PLINK package Repository URL: https://bitbucket.org/galaxy/galaxy-central/ -- This is a commit notification from bitbucket.org. You are receiving this because you have the service enabled, addressing the recipient of this email. --===============8860946898754127966==--