3 new changesets in galaxy-central:
http://bitbucket.org/galaxy/galaxy-central/changeset/67e2f1881067/
changeset: r5572:67e2f1881067
user: fubar
date: 2011-05-17 16:29:06
summary: amerkinized spelin' and removed copyright notices on some picard tool forms
affected #: 7 files (3.3 KB)
--- a/tools/picard/rgPicardASMetrics.xml Tue May 17 01:57:04 2011 -0400
+++ b/tools/picard/rgPicardASMetrics.xml Tue May 17 10:29:06 2011 -0400
@@ -183,24 +183,6 @@
Picard and Samtools go together. They are external to and completely independent of Galaxy. We acknowledge that all credit for
their methods and contribution are due to them.
-Here, you can apply Picard tools through Galaxy which you may find much easier than through the native Picard command line interface.
-
-
-
------
-
-.. class:: infomark
-
-**Copyright**
-
-This Galaxy tool is a component of the rgenetics toolkit.
-
-Written by and copyright Ross Lazarus, ross.lazarus at gmail etc, September 2010
-
-All rgenetics artefacts are available licensed under the LGPL_
-Other dependencies are licensed at the author's discretion - please see each individual package for details
-
- .. _LGPL: http://www.gnu.org/copyleft/lesser.html
</help></tool>
--- a/tools/picard/rgPicardFixMate.xml Tue May 17 01:57:04 2011 -0400
+++ b/tools/picard/rgPicardFixMate.xml Tue May 17 10:29:06 2011 -0400
@@ -101,7 +101,7 @@
if a tool fails unexpectedly when you run it in Galaxy.
All the Picard tools are freely available and are documented
-at http://picard.sourceforge.net/command-line-overview.shtml#CollectAlignmentS…
+at http://picard.sourceforge.net/command-line-overview.shtml
Written by and copyright Ross Lazarus, ross.lazarus at gmail etc, September 2010
Code cleaned up and the ugliest hacks repaired by Raphael Lullis.
@@ -109,18 +109,6 @@
It takes a village of programmers to wrap a picard tool
------
-
-.. class:: infomark
-
-**Copyright**
-
-This Galaxy tool is a component of the rgenetics toolkit.
-
-All rgenetics artefacts are available licensed under the LGPL_
-Other dependencies are licensed at the author's discretion - please see each individual package for details
-
- .. _LGPL: http://www.gnu.org/copyleft/lesser.html
</help></tool>
--- a/tools/picard/rgPicardGCBiasMetrics.xml Tue May 17 01:57:04 2011 -0400
+++ b/tools/picard/rgPicardGCBiasMetrics.xml Tue May 17 10:29:06 2011 -0400
@@ -162,22 +162,5 @@
They are external to and completely independent of Galaxy. We acknowledge that all credit for
their methods and contribution are due to them.
-Here, you can apply Picard tools through Galaxy which might be easier than through the native Picard command line interface.
-
-
------
-
-.. class:: infomark
-
-**Copyright**
-
-This Galaxy tool is a component of the rgenetics toolkit.
-
-Written by and copyright Ross Lazarus, ross.lazarus at gmail etc, September 2010
-All rgenetics artefacts are available licensed under the LGPL_
-Other dependencies are licensed at the author's discretion - please see each individual package for details
-
- .. _LGPL: http://www.gnu.org/copyleft/lesser.html
-
</help></tool>
--- a/tools/picard/rgPicardHsMetrics.xml Tue May 17 01:57:04 2011 -0400
+++ b/tools/picard/rgPicardHsMetrics.xml Tue May 17 10:29:06 2011 -0400
@@ -1,5 +1,5 @@
<tool name="Sam/bam Hybrid Selection Metrics:" id="PicardHsMetrics" version="0.01">
- <description>For (eg exome) targetted data</description>
+ <description>For (eg exome) targeted data</description><command interpreter="python">
picard_wrapper.py -i "$input_file" -d "$html_file.files_path" -t "$html_file" --datatype "$input_file.ext"
@@ -139,19 +139,6 @@
aligned short read sequence data. Sequence data must be chosen from the sam/bam format files in your current history.
Target and bait files must be selected from the UCSC BED format in your current history.
------
-
-.. class:: infomark
-
-**Copyright**
-
-This is part of the rgenetics toolkit.
-
-Written by Ross Lazarus 2010
-All rgenetics artefacts are available licensed under the LGPL_
-Other dependencies are licensed at the author's discretion - please see each individual package for details
-
- .. _LGPL: http://www.gnu.org/copyleft/lesser.html
</help></tool>
--- a/tools/picard/rgPicardInsertSize.xml Tue May 17 01:57:04 2011 -0400
+++ b/tools/picard/rgPicardInsertSize.xml Tue May 17 10:29:06 2011 -0400
@@ -78,23 +78,8 @@
if a tool fails unexpectedly when you run it in Galaxy.
All the Picard tools are freely available and are documented
-at http://picard.sourceforge.net/command-line-overview.shtml#CollectInsertSize…
+at http://picard.sourceforge.net/command-line-overview.shtml
-Here, you can apply Picard tools through Galaxy which might be easier than through the native Picard command line interface.
-
------
-
-.. class:: infomark
-
-**Copyright**
-
-This Galaxy tool is a component of the rgenetics toolkit.
-
-Written by and copyright Ross Lazarus, ross.lazarus at gmail etc, September 2010
-Code cleaned up and the ugliest hacks repaired by Raphael Lullis
-
-All rgenetics artifacts are available licensed under the LGPL
-Other dependencies are licensed at the author's discretion - please see each individual package for details
</help></tool>
--- a/tools/picard/rgPicardLibComplexity.xml Tue May 17 01:57:04 2011 -0400
+++ b/tools/picard/rgPicardLibComplexity.xml Tue May 17 10:29:06 2011 -0400
@@ -121,22 +121,6 @@
All the Picard tools are freely available and are documented
at http://picard.sourceforge.net/command-line-overview.shtml#CollectInsertSize…
------
-
-.. class:: infomark
-
-**Copyright**
-
-This Galaxy tool is a component of the rgenetics toolkit.
-
-Written by and copyright Ross Lazarus, ross.lazarus at gmail etc, September 2010
-Code cleaned up and the ugliest hacks repaired by Raphael Lullis
-
-All rgenetics artefacts are available licensed under the LGPL_
-Other dependencies are licensed at the author's discretion - please see each individual package for details
-
- .. _LGPL: http://www.gnu.org/copyleft/lesser.html
-
</help></tool>
--- a/tools/picard/rgPicardMarkDups.xml Tue May 17 01:57:04 2011 -0400
+++ b/tools/picard/rgPicardMarkDups.xml Tue May 17 10:29:06 2011 -0400
@@ -120,24 +120,6 @@
All the Picard tools are freely available and are documented
at http://picard.sourceforge.net/command-line-overview.shtml#CollectInsertSize…
-Here, you can apply Picard tools through Galaxy which might be easier than through the native Picard command line interface.
-
------
-
-.. class:: infomark
-
-**Copyright**
-
-This Galaxy tool is a component of the rgenetics toolkit.
-
-Written by and copyright Ross Lazarus, ross.lazarus at gmail etc, September 2010
-Code cleaned up and the ugliest hacks repaired by Raphael Lullis
-
-All rgenetics artifacts are available licensed under the LGPL_
-Other dependencies are licensed at the author's discretion - please see each individual package for details
-
-.. _LGPL: http://www.gnu.org/copyleft/lesser.html
-
</help></tool>
http://bitbucket.org/galaxy/galaxy-central/changeset/d6a905426210/
changeset: r5573:d6a905426210
user: fubar
date: 2011-05-17 17:39:52
summary: branch merge
affected #: 2 files (331 bytes)
--- a/tools/ngs_rna/cufflinks_wrapper.py Tue May 17 10:29:06 2011 -0400
+++ b/tools/ngs_rna/cufflinks_wrapper.py Tue May 17 11:39:52 2011 -0400
@@ -34,9 +34,12 @@
where each end is 50bp, you should set -r to be 200. The default is 45bp.')
parser.add_option( '-G', '--GTF', dest='GTF', help='Tells Cufflinks to use the supplied reference annotation to estimate isoform expression. It will not assemble novel transcripts, and the program will ignore alignments not structurally compatible with any reference transcript.' )
- # Normalization options.
+ # Normalization options.
parser.add_option( "-N", "--quartile-normalization", dest="do_normalization", action="store_true" )
+ # Wrapper / Galaxy options.
+ parser.add_option( '-A', '--assembled-isoforms-output', dest='assembled_isoforms_output_file', help='Assembled isoforms output file; formate is GTF.' )
+
# Advanced Options:
parser.add_option( '--num-importance-samples', dest='num_importance_samples', help='Sets the number of importance samples generated for each locus during abundance estimation. Default: 1000' )
parser.add_option( '--max-mle-iterations', dest='max_mle_iterations', help='Sets the number of iterations allowed during maximum likelihood estimation of abundances. Default: 5000' )
@@ -136,6 +139,9 @@
except OverflowError:
pass
tmp_stderr.close()
+
+ # Copy outputs.
+ shutil.copyfile( "transcripts.gtf" , options.assembled_isoforms_output_file )
# Error checking.
if returncode != 0:
--- a/tools/ngs_rna/cufflinks_wrapper.xml Tue May 17 10:29:06 2011 -0400
+++ b/tools/ngs_rna/cufflinks_wrapper.xml Tue May 17 11:39:52 2011 -0400
@@ -6,6 +6,7 @@
<command interpreter="python">
cufflinks_wrapper.py
--input=$input
+ --assembled-isoforms-output=$assembled_isoforms
--num-threads="4"
-I $max_intron_len
-F $min_isoform_fraction
@@ -95,7 +96,7 @@
<outputs><data format="tabular" name="genes_expression" label="${tool.name} on ${on_string}: gene expression" from_work_dir="genes.fpkm_tracking"/><data format="tabular" name="transcripts_expression" label="${tool.name} on ${on_string}: transcript expression" from_work_dir="isoforms.fpkm_tracking"/>
- <data format="gtf" name="assembled_isoforms" label="${tool.name} on ${on_string}: assembled transcripts" from_work_dir="transcripts.gtf"/>
+ <data format="gtf" name="assembled_isoforms" label="${tool.name} on ${on_string}: assembled transcripts"/></outputs><tests>
http://bitbucket.org/galaxy/galaxy-central/changeset/56b4c606cfd5/
changeset: r5574:56b4c606cfd5
user: fubar
date: 2011-05-17 18:14:32
summary: branch merge with Kelly's fixes
affected #: 5 files (16.5 KB)
--- a/tools/maf/genebed_maf_to_fasta.xml Tue May 17 11:39:52 2011 -0400
+++ b/tools/maf/genebed_maf_to_fasta.xml Tue May 17 12:14:32 2011 -0400
@@ -8,6 +8,7 @@
<inputs><param name="input1" type="data" format="bed" label="Gene BED File"><validator type="unspecified_build" />
+ <validator type="expression" message="Input must be in BED12 format.">value.metadata.columns >= 12</validator><!-- allow 12+ columns, not as strict as possible. TODO: only list bed files with 12+ columns --></param><conditional name="maf_source_type"><param name="maf_source" type="select" label="MAF Source">
--- a/tools/maf/interval2maf.xml Tue May 17 11:39:52 2011 -0400
+++ b/tools/maf/interval2maf.xml Tue May 17 12:14:32 2011 -0400
@@ -55,7 +55,7 @@
</when></conditional><conditional name="split_blocks_by_species_selector">
- <param name="split_blocks_by_species" type="select" label="Split blocks by species" help="See the Split MAF blocks by Species tool for more information.">
+ <param name="split_blocks_by_species" type="select" label="Split blocks by species" help="Not usually applicable. See help below for more information."><option value="split_blocks_by_species">Split by species</option><option value="dont_split_blocks_by_species" selected="true">Do not split</option></param>
@@ -112,5 +112,176 @@
.. image:: ./static/images/maf_icons/interval2maf.png
+-------
+
+**Split blocks by species**
+
+This option examines each MAF block for multiple occurrences of a species in a single block. When this occurs, a block is split into multiple blocks where every combination of one sequence per species per block is represented.
+
+The interface for this option has two inputs:
+
+ * **MAF file to split**. Choose multiple alignments from history to be split by species.
+ * **Collapse empty alignment columns**. Should alignment columns containing only gaps in the new blocks be removed.
+
+
+
+**Example 1**: **Collapse empty alignment columns is Yes**:
+
+For the following alignment::
+
+ ##maf version=1
+ a score=2047408.0
+ s species1.chr1 147984545 85 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species1.chr1 147984545 83 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTT--GTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species1.chr1 147984645 79 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTT------AG
+ s species1.chr1 147984645 79 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTC---GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTC---AG
+ s species2.chr1 129723125 85 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species2.chr1 129723125 83 - 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCT--GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species2.chr1 129723925 79 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTC------AG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+the tool will create **a single** history item containing 12 alignment blocks (notice that no columns contain only gaps)::
+
+ ##maf version=1
+ a score=2047408.0
+ s species1.chr1 147984545 85 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species2.chr1 129723125 85 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984545 83 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTT--GTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species2.chr1 129723125 85 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984645 79 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTT------AG
+ s species2.chr1 129723125 85 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984645 79 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTC---GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTC---AG
+ s species2.chr1 129723125 85 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984545 85 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species2.chr1 129723125 83 - 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCT--GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984545 83 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTT-GTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species2.chr1 129723125 83 - 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCT-GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC--GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984645 79 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTT------AG
+ s species2.chr1 129723125 83 - 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCT--GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984645 79 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTC-GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTC---AG
+ s species2.chr1 129723125 83 - 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC-GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984545 85 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species2.chr1 129723925 79 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTC------AG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984545 83 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTT--GTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species2.chr1 129723925 79 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTC------AG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984645 79 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTAG
+ s species2.chr1 129723925 79 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGCAG
+
+ a score=2047408.0
+ s species1.chr1 147984645 79 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTC---GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCAG
+ s species2.chr1 129723925 79 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTC---AG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC---AG
+
+
+
+**Example 2**: **Collapse empty alignment columns is No**:
+
+For the following alignment::
+
+ ##maf version=1
+ a score=2047408.0
+ s species1.chr1 147984545 85 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species1.chr1 147984545 83 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTT--GTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species1.chr1 147984645 79 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTT------AG
+ s species1.chr1 147984645 79 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTC---GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTC---AG
+ s species2.chr1 129723125 85 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species2.chr1 129723125 83 - 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCT--GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species2.chr1 129723925 79 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTC------AG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+the tool will create **a single** history item containing 12 alignment blocks (notice that some columns contain only gaps)::
+
+ ##maf version=1
+ a score=2047408.0
+ s species1.chr1 147984545 85 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species2.chr1 129723125 85 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984545 83 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTT--GTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species2.chr1 129723125 85 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984645 79 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTT------AG
+ s species2.chr1 129723125 85 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984645 79 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTC---GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTC---AG
+ s species2.chr1 129723125 85 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984545 85 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species2.chr1 129723125 83 - 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCT--GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984545 83 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTT--GTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species2.chr1 129723125 83 - 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCT--GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984645 79 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTT------AG
+ s species2.chr1 129723125 83 - 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCT--GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984645 79 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTC---GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTC---AG
+ s species2.chr1 129723125 83 - 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCT--GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTCGTCCTCAG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984545 85 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species2.chr1 129723925 79 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTC------AG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984545 83 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTT--GTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTCCTCAG
+ s species2.chr1 129723925 79 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTC------AG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984645 79 + 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTT------AG
+ s species2.chr1 129723925 79 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTC------AG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
+ a score=2047408.0
+ s species1.chr1 147984645 79 - 245522847 ATGGCGTCGGCCTCCTCCGGGCCGTCGTC---GGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTTGTC---AG
+ s species2.chr1 129723925 79 + 229575298 ATGGCGTCGGCCTCCTCCGGGCCGTCGTCTTCGGTCGGTTTTTCATCCTTTGATCCCGCGGTCCCTTCCTGTACCTC------AG
+ s species3.chr3 68255714 76 - 258222147 ATGGCGTCCGCCTCCTCAGGGCCAGCGGC---GGCGGGGTTTTCACCCCTTGATTCCGGGGTCCCTGCCGGTACCGC------AG
+
</help></tool>
--- a/tools/maf/maf_to_bed.xml Tue May 17 11:39:52 2011 -0400
+++ b/tools/maf/maf_to_bed.xml Tue May 17 12:14:32 2011 -0400
@@ -1,5 +1,5 @@
<tool id="MAF_To_BED1" name="Maf to BED" force_history_refresh="True">
- <description>Converts a MAF formated file to the BED format</description>
+ <description>Converts a MAF formatted file to the BED format</description><command interpreter="python">maf_to_bed.py $input1 $out_file1 $species $complete_blocks $__new_file_path__</command><inputs><param format="maf" name="input1" type="data" label="MAF file to convert"/>
--- a/tools/maf/maf_to_fasta.xml Tue May 17 11:39:52 2011 -0400
+++ b/tools/maf/maf_to_fasta.xml Tue May 17 12:14:32 2011 -0400
@@ -1,5 +1,5 @@
<tool id="MAF_To_Fasta1" name="MAF to FASTA" version="1.0.1">
- <description>Converts a MAF formated file to FASTA format</description>
+ <description>Converts a MAF formatted file to FASTA format</description><command interpreter="python">
#if $fasta_target_type.fasta_type == "multiple" #maf_to_fasta_multiple_sets.py $input1 $out_file1 $fasta_target_type.species $fasta_target_type.complete_blocks
#else #maf_to_fasta_concat.py $fasta_target_type.species $input1 $out_file1
--- a/tools/ngs_rna/tophat_wrapper.xml Tue May 17 11:39:52 2011 -0400
+++ b/tools/ngs_rna/tophat_wrapper.xml Tue May 17 12:14:32 2011 -0400
@@ -1,4 +1,4 @@
-<tool id="tophat" name="Tophat" version="1.2.0">
+<tool id="tophat" name="Tophat" version="1.2.1"><description>Find splice junctions using RNA-seq data</description><requirements><requirement type="package">tophat</requirement>
@@ -100,7 +100,7 @@
--seg-length=$singlePaired.pParams.seg_length
--library-type=$singlePaired.pParams.library_type
- ## Indel search.
+ ## Indel search.
#if $singlePaired.pParams.indel_search.allow_indel_search == "Yes":
--allow-indels
--max-insertion-length $singlePaired.pParams.indel_search.max_insertion_length
@@ -386,6 +386,23 @@
( singlePaired['pParams']['indel_search']['allow_indel_search'] == 'Yes' ) )
)
</filter>
+ <actions>
+ <conditional name="refGenomeSource.genomeSource">
+ <when value="indexed">
+ <action type="metadata" name="dbkey">
+ <option type="from_data_table" name="tophat_indexes" column="1" offset="0">
+ <filter type="param_value" column="0" value="#" compare="startswith" keep="False"/>
+ <filter type="param_value" ref="refGenomeSource.index" column="0"/>
+ </option>
+ </action>
+ </when>
+ <when value="history">
+ <action type="metadata" name="dbkey">
+ <option type="from_param" name="refGenomeSource.ownFile" param_attribute="dbkey" />
+ </action>
+ </when>
+ </conditional>
+ </actions></data><data format="bed" name="deletions" label="${tool.name} on ${on_string}: deletions" from_work_dir="tophat_out/deletions.bed"><filter>
@@ -396,9 +413,62 @@
( singlePaired['pParams']['indel_search']['allow_indel_search'] == 'Yes' ) )
)
</filter>
+ <actions>
+ <conditional name="refGenomeSource.genomeSource">
+ <when value="indexed">
+ <action type="metadata" name="dbkey">
+ <option type="from_data_table" name="tophat_indexes" column="1" offset="0">
+ <filter type="param_value" column="0" value="#" compare="startswith" keep="False"/>
+ <filter type="param_value" ref="refGenomeSource.index" column="0"/>
+ </option>
+ </action>
+ </when>
+ <when value="history">
+ <action type="metadata" name="dbkey">
+ <option type="from_param" name="refGenomeSource.ownFile" param_attribute="dbkey" />
+ </action>
+ </when>
+ </conditional>
+ </actions></data>
- <data format="bed" name="junctions" label="${tool.name} on ${on_string}: splice junctions"/>
- <data format="bam" name="accepted_hits" label="${tool.name} on ${on_string}: accepted_hits"/>
+ <data format="bed" name="junctions" label="${tool.name} on ${on_string}: splice junctions">
+ <actions>
+ <conditional name="refGenomeSource.genomeSource">
+ <when value="indexed">
+ <action type="metadata" name="dbkey">
+ <option type="from_data_table" name="tophat_indexes" column="1" offset="0">
+ <filter type="param_value" column="0" value="#" compare="startswith" keep="False"/>
+ <filter type="param_value" ref="refGenomeSource.index" column="0"/>
+ </option>
+ </action>
+ </when>
+ <when value="history">
+ <action type="metadata" name="dbkey">
+ <option type="from_param" name="refGenomeSource.ownFile" param_attribute="dbkey" />
+ </action>
+ </when>
+ </conditional>
+ </actions>
+ </data>
+ <data format="bam" name="accepted_hits" label="${tool.name} on ${on_string}: accepted_hits">
+ <actions>
+ <conditional name="refGenomeSource.genomeSource">
+ <when value="indexed">
+ <action type="metadata" name="dbkey">
+ <option type="from_data_table" name="tophat_indexes" column="1" offset="0">
+ <filter type="param_value" column="0" value="#" compare="startswith" keep="False"/>
+ <filter type="param_value" ref="refGenomeSource.index" column="0"/>
+ </option>
+ </action>
+ </when>
+ <when value="history">
+ <action type="metadata" name="dbkey">
+ <option type="from_param" name="refGenomeSource.ownFile" param_attribute="dbkey" />
+ </action>
+ </when>
+ </conditional>
+ </actions>
+ </data></outputs><tests>
Repository URL: https://bitbucket.org/galaxy/galaxy-central/
--
This is a commit notification from bitbucket.org. You are receiving
this because you have the service enabled, addressing the recipient of
this email.
1 new changeset in galaxy-central:
http://bitbucket.org/galaxy/galaxy-central/changeset/a66255ab6b93/
changeset: r5567:a66255ab6b93
user: kanwei
date: 2011-05-16 23:08:06
summary: trackster: Refactor LineTrack overflow, fix it to work better for Line and Filled modes
affected #: 2 files (679 bytes)
--- a/lib/galaxy/visualization/tracks/data_providers.py Mon May 16 10:03:47 2011 -0400
+++ b/lib/galaxy/visualization/tracks/data_providers.py Mon May 16 17:08:06 2011 -0400
@@ -336,7 +336,8 @@
num_points = (end-start) / 1280
if num_points < 1:
num_points = end - start
- num_points = max(num_points, 10)
+ else:
+ num_points = min(num_points, 500)
data = bbi.query(chrom, start, end, num_points)
f.close()
--- a/static/scripts/trackster.js Mon May 16 10:03:47 2011 -0400
+++ b/static/scripts/trackster.js Mon May 16 17:08:06 2011 -0400
@@ -3062,14 +3062,13 @@
// Pixel position of 0 on the y axis
var y_zero = Math.round( height + min_value / vertical_range * height );
- // Line at 0.0
+ // Horizontal line to denote x-axis
if ( mode !== "Intensity" ) {
ctx.fillStyle = "#aaa";
ctx.fillRect( 0, y_zero, width, 1 );
}
ctx.beginPath();
- ctx.fillStyle = this.prefs.color;
var x_scaled, y, delta_x_px;
if (data.length > 1) {
delta_x_px = Math.ceil((data[1][0] - data[0][0]) * w_scale);
@@ -3077,8 +3076,10 @@
delta_x_px = 10;
}
for (var i = 0, len = data.length; i < len; i++) {
+ ctx.fillStyle = this.prefs.color;
x_scaled = Math.round((data[i][0] - view_start) * w_scale);
y = data[i][1];
+ var top_overflow = false, bot_overflow = false;
if (y === null) {
if (in_path && mode === "Filled") {
ctx.lineTo(x_scaled, height_px);
@@ -3087,8 +3088,10 @@
continue;
}
if (y < min_value) {
+ bot_overflow = true;
y = min_value;
} else if (y > max_value) {
+ top_overflow = true;
y = max_value;
}
@@ -3116,6 +3119,24 @@
}
}
}
+ // Draw lines at boundaries if overflowing min or max
+ ctx.fillStyle = this.prefs.overflow_color;
+ if (top_overflow || bot_overflow) {
+ var overflow_x;
+ if (mode === "Histogram" || mode === "Intensity") {
+ overflow_x = delta_x_px;
+ } else { // Line and Filled, which are points
+ x_scaled -= 2; // Move it over to the left so it's centered on the point
+ overflow_x = 4;
+ }
+ if (top_overflow) {
+ ctx.fillRect(x_scaled, 0, overflow_x, 3);
+ }
+ if (bot_overflow) {
+ ctx.fillRect(x_scaled, height_px - 3, overflow_x, 3);
+ }
+ }
+ ctx.fillStyle = this.prefs.color;
}
if (mode === "Filled") {
if (in_path) {
@@ -3127,38 +3148,6 @@
ctx.stroke();
}
- // Draw lines at bounderies if overflowing min or max
- var overflow_min_start = -1,
- overflow_max_start = -1;
- ctx.fillStyle = this.prefs.overflow_color;
- var last_x_scaled;
- for (var i = 0, len = data.length; i < len; i++) {
- y = data[i][1];
- x_scaled = Math.round((data[i][0] - view_start) * w_scale);
-
- // If we are in a min/max run, check if it should be ended
- if ( overflow_max_start >= 0 && ( y === null || y < max_value ) ) {
- // Value does not exist or is in valid range, any overflow ends
- ctx.fillRect( overflow_max_start, 0, last_x_scaled + delta_x_px - overflow_max_start, 2 );
- overflow_max_start = -1;
- } else if ( overflow_min_start >= 0 && ( y === null || y > min_value ) ) {
- // Draw bottom overflow bar
- ctx.fillRect( overflow_min_start, height - 2, last_x_scaled + delta_x_px - overflow_min_start, 2 );
- overflow_min_start = -1;
- }
-
- // Now check if we should start a new one (this may happen on the same
- // base as above if switching between min/max)
- if ( y !== null && y > max_value && overflow_max_start < 0 ) {
- // Top overflows and we are not already in a run of overflow
- overflow_max_start = x_scaled;
- } else if ( y !== null && y < min_value && overflow_min_start < 0 ) {
- // Bottom overflows and we are not already in a run
- overflow_min_start = x_scaled;
- }
- last_x_scaled = x_scaled;
- }
-
ctx.restore();
}
Repository URL: https://bitbucket.org/galaxy/galaxy-central/
--
This is a commit notification from bitbucket.org. You are receiving
this because you have the service enabled, addressing the recipient of
this email.
2 new changesets in galaxy-central:
http://bitbucket.org/galaxy/galaxy-central/changeset/0d6acef66750/
changeset: r5565:0d6acef66750
user: fubar
date: 2011-05-16 15:12:19
summary: repairs to rgfakePed tool for generating null genotype data - since removing the _code.py trick of renaming composite file components means all of the parts (eg map and ped) acquire the default name of RgeneticsData, that's what the tool has to create - rather than sensibly named components based on the title - since only way to fix this wart is to reintroduce a bigger wart using the post execution hook, decided to just go with that default base_name to make things more sustanable in the long term
affected #: 3 files (594 bytes)
--- a/test-data/rgtestouts/rgfakePed/rgfakePedtest1.lped Fri May 13 21:24:03 2011 -0400
+++ b/test-data/rgtestouts/rgfakePed/rgfakePedtest1.lped Mon May 16 09:12:19 2011 -0400
@@ -9,8 +9,8 @@
</head><body><div class="document">
-<li><a href="rgfakePedtest1.ped">rgfakePedtest1.ped</a></li>
-<li><a href="rgfakePedtest1.map">rgfakePedtest1.map</a></li>
-<br><h3>This is simulated null genotype data generated by Rgenetics!</h3>rgfakePed.py called with command line:<br><pre>/share/shared/galaxy/tools/rgenetics/rgfakePed.py --title rgfakePedtest1 -o /share/shared/galaxy/test-data/rgtestouts/rgfakePed/rgfakePedtest1.lped -p /share/shared/galaxy/test-data/rgtestouts/rgfakePed -c 20 -n 40 -s 10 -w 0 -v 0 -l pbed -d T -m 0 -M 0
+<li><a href="RgeneticsData.map">RgeneticsData.map</a></li>
+<li><a href="RgeneticsData.ped">RgeneticsData.ped</a></li>
+<br><h3>This is simulated null genotype data generated by Rgenetics!</h3>rgfakePed.py called with command line:<br><pre>/udd/rerla/galaxy-central/tools/rgenetics/rgfakePed.py --title rgfakePedtest1 -o /export/tmp/tmpy7a643/database/files/000/dataset_1.dat -p /udd/rerla/galaxy-central/database/job_working_directory/1/dataset_1_files -c 20 -n 40 -s 10 -w 0.0 -v 0 -l L -d T -m 0.0 -M 0.0
</pre></div></body></html>
\ No newline at end of file
--- a/tools/rgenetics/rgfakePed.py Fri May 13 21:24:03 2011 -0400
+++ b/tools/rgenetics/rgfakePed.py Mon May 16 09:12:19 2011 -0400
@@ -1,103 +1,103 @@
-#! /usr/local/bin/python2.4
-# pedigree data faker
-# specifically designed for scalability testing of
-# Shaun Purcel's PLINK package
-# derived from John Ziniti's original suggestion
-# allele frequency spectrum and random mating added
-# ross lazarus me fecit january 13 2007
-# copyright ross lazarus 2007
-# without psyco
-# generates about 10k snp genotypes in 2k subjects (666 trios) per minute or so.
-# so 500k (a billion genotypes), at about 4 trios/min will a couple of hours to generate
-# psyco makes it literally twice as quick!!
-# all rights reserved except as granted under the terms of the LGPL
-# see http://www.gnu.org/licenses/lgpl.html
-# for a copy of the license you receive with this software
-# and for your rights and obligations
-# especially if you wish to modify or redistribute this code
-# january 19 added random missingness inducer
-# currently about 15M genos/minute without psyco, 30M/minute with
-# so a billion genos should take about 40 minutes with psyco or 80 without...
-# added mendel error generator jan 23 rml
-
-
+#! /usr/local/bin/python2.4
+# pedigree data faker
+# specifically designed for scalability testing of
+# Shaun Purcel's PLINK package
+# derived from John Ziniti's original suggestion
+# allele frequency spectrum and random mating added
+# ross lazarus me fecit january 13 2007
+# copyright ross lazarus 2007
+# without psyco
+# generates about 10k snp genotypes in 2k subjects (666 trios) per minute or so.
+# so 500k (a billion genotypes), at about 4 trios/min will a couple of hours to generate
+# psyco makes it literally twice as quick!!
+# all rights reserved except as granted under the terms of the LGPL
+# see http://www.gnu.org/licenses/lgpl.html
+# for a copy of the license you receive with this software
+# and for your rights and obligations
+# especially if you wish to modify or redistribute this code
+# january 19 added random missingness inducer
+# currently about 15M genos/minute without psyco, 30M/minute with
+# so a billion genos should take about 40 minutes with psyco or 80 without...
+# added mendel error generator jan 23 rml
+
+
import random,sys,time,os,string
-
-from optparse import OptionParser
-
-
-width = 500000
-ALLELES = ['1','2','3','4']
-prog = os.path.split(sys.argv[0])[-1]
-debug = 0
-
-"""Natural-order sorting, supporting embedded numbers.
-# found at http://lists.canonical.org/pipermail/kragen-hacks/2005-October/000419.html
-note test code there removed to conserve brain space
-foo9bar2 < foo10bar2 < foo10bar10
-
-"""
-import random, re, sys
-
-def natsort_key(item):
- chunks = re.split('(\d+(?:\.\d+)?)', item)
- for ii in range(len(chunks)):
- if chunks[ii] and chunks[ii][0] in '0123456789':
- if '.' in chunks[ii]: numtype = float
- else: numtype = int
- # wrap in tuple with '0' to explicitly specify numbers come first
- chunks[ii] = (0, numtype(chunks[ii]))
- else:
- chunks[ii] = (1, chunks[ii])
- return (chunks, item)
-
-def natsort(seq):
- "Sort a sequence of text strings in a reasonable order."
- alist = [item for item in seq]
- alist.sort(key=natsort_key)
- return alist
-
-
-def makeUniformMAFdist(low=0.02, high=0.5):
- """Fake a non-uniform maf distribution to make the data
- more interesting. Provide uniform 0.02-0.5 distribution"""
- MAFdistribution = []
- for i in xrange(int(100*low),int(100*high)+1):
- freq = i/100.0 # uniform
- MAFdistribution.append(freq)
- return MAFdistribution
-
-def makeTriangularMAFdist(low=0.02, high=0.5, beta=5):
- """Fake a non-uniform maf distribution to make the data
- more interesting - more rare alleles """
- MAFdistribution = []
- for i in xrange(int(100*low),int(100*high)+1):
- freq = (51 - i)/100.0 # large numbers of small allele freqs
- for j in range(beta*i): # or i*i for crude exponential distribution
- MAFdistribution.append(freq)
- return MAFdistribution
-
-def makeFbathead(rslist=[], chromlist=[], poslist=[], width=100000):
- """header row
- """
- res = ['%s_%s_%s' % (chromlist[x], poslist[x], rslist[x]) for x in range(len(rslist))]
- return ' '.join(res)
-
-def makeMap( width=500000, MAFdistribution=[], useGP=False):
- """make snp allele and frequency tables for consistent generation"""
- usegp = 1
- snpdb = 'snp126'
- hgdb = 'hg18'
- alleles = []
- freqs = []
- rslist = []
- chromlist = []
- poslist = []
- for snp in range(width):
- random.shuffle(ALLELES)
- alleles.append(ALLELES[0:2]) # need two DIFFERENT alleles!
+
+from optparse import OptionParser
+
+defbasename="RgeneticsData"
+width = 500000
+ALLELES = ['1','2','3','4']
+prog = os.path.split(sys.argv[0])[-1]
+debug = 0
+
+"""Natural-order sorting, supporting embedded numbers.
+# found at http://lists.canonical.org/pipermail/kragen-hacks/2005-October/000419.html
+note test code there removed to conserve brain space
+foo9bar2 < foo10bar2 < foo10bar10
+
+"""
+import random, re, sys
+
+def natsort_key(item):
+ chunks = re.split('(\d+(?:\.\d+)?)', item)
+ for ii in range(len(chunks)):
+ if chunks[ii] and chunks[ii][0] in '0123456789':
+ if '.' in chunks[ii]: numtype = float
+ else: numtype = int
+ # wrap in tuple with '0' to explicitly specify numbers come first
+ chunks[ii] = (0, numtype(chunks[ii]))
+ else:
+ chunks[ii] = (1, chunks[ii])
+ return (chunks, item)
+
+def natsort(seq):
+ "Sort a sequence of text strings in a reasonable order."
+ alist = [item for item in seq]
+ alist.sort(key=natsort_key)
+ return alist
+
+
+def makeUniformMAFdist(low=0.02, high=0.5):
+ """Fake a non-uniform maf distribution to make the data
+ more interesting. Provide uniform 0.02-0.5 distribution"""
+ MAFdistribution = []
+ for i in xrange(int(100*low),int(100*high)+1):
+ freq = i/100.0 # uniform
+ MAFdistribution.append(freq)
+ return MAFdistribution
+
+def makeTriangularMAFdist(low=0.02, high=0.5, beta=5):
+ """Fake a non-uniform maf distribution to make the data
+ more interesting - more rare alleles """
+ MAFdistribution = []
+ for i in xrange(int(100*low),int(100*high)+1):
+ freq = (51 - i)/100.0 # large numbers of small allele freqs
+ for j in range(beta*i): # or i*i for crude exponential distribution
+ MAFdistribution.append(freq)
+ return MAFdistribution
+
+def makeFbathead(rslist=[], chromlist=[], poslist=[], width=100000):
+ """header row
+ """
+ res = ['%s_%s_%s' % (chromlist[x], poslist[x], rslist[x]) for x in range(len(rslist))]
+ return ' '.join(res)
+
+def makeMap( width=500000, MAFdistribution=[], useGP=False):
+ """make snp allele and frequency tables for consistent generation"""
+ usegp = 1
+ snpdb = 'snp126'
+ hgdb = 'hg18'
+ alleles = []
+ freqs = []
+ rslist = []
+ chromlist = []
+ poslist = []
+ for snp in range(width):
+ random.shuffle(ALLELES)
+ alleles.append(ALLELES[0:2]) # need two DIFFERENT alleles!
freqs.append(random.choice(MAFdistribution)) # more rare alleles
- if useGP:
+ if useGP:
try:
import MySQLdb
genome = MySQLdb.Connect('localhost', 'hg18', 'G3gn0m3')
@@ -106,402 +106,402 @@
if debug:
print 'cannot connect to local copy of golden path'
usegp = 0
- if usegp and useGP: # urrrgghh getting snps into chrom offset order is complicated....
- curs.execute('use %s' % hgdb)
- print 'Collecting %d real rs numbers - this may take a while' % width
- # get a random draw of enough reasonable (hapmap) snps with frequency data
- s = '''select distinct chrom,chromEnd, name from %s where avHet > 0 and chrom not like '%%random'
- group by name order by rand() limit %d''' % (snpdb,width)
- curs.execute(s)
- reslist = curs.fetchall()
- reslist = ['%s\t%09d\t%s' % (x[3:],y,z) for x,y,z in reslist] # get rid of chr
- reslist = natsort(reslist)
- for s in reslist:
- chrom,pos,rs = s.split('\t')
- rslist.append(rs)
- chromlist.append(chrom)
- poslist.append(pos)
- else:
- chrom = '1'
- for snp in range(width):
- pos = '%d' % (1000*snp)
- rs = 'rs00%d' % snp
- rslist.append(rs)
- chromlist.append(chrom)
- poslist.append(pos)
- return alleles,freqs, rslist, chromlist, poslist
-
-def writeMap(fprefix = '', fpath='./', rslist=[], chromlist=[], poslist=[], width = 500000):
- """make a faked plink compatible map file - fbat files
+ if usegp and useGP: # urrrgghh getting snps into chrom offset order is complicated....
+ curs.execute('use %s' % hgdb)
+ print 'Collecting %d real rs numbers - this may take a while' % width
+ # get a random draw of enough reasonable (hapmap) snps with frequency data
+ s = '''select distinct chrom,chromEnd, name from %s where avHet > 0 and chrom not like '%%random'
+ group by name order by rand() limit %d''' % (snpdb,width)
+ curs.execute(s)
+ reslist = curs.fetchall()
+ reslist = ['%s\t%09d\t%s' % (x[3:],y,z) for x,y,z in reslist] # get rid of chr
+ reslist = natsort(reslist)
+ for s in reslist:
+ chrom,pos,rs = s.split('\t')
+ rslist.append(rs)
+ chromlist.append(chrom)
+ poslist.append(pos)
+ else:
+ chrom = '1'
+ for snp in range(width):
+ pos = '%d' % (1000*snp)
+ rs = 'rs00%d' % snp
+ rslist.append(rs)
+ chromlist.append(chrom)
+ poslist.append(pos)
+ return alleles,freqs, rslist, chromlist, poslist
+
+def writeMap(fprefix = '', fpath='./', rslist=[], chromlist=[], poslist=[], width = 500000):
+ """make a faked plink compatible map file - fbat files
have the map written as a header line"""
outf = '%s.map'% (fprefix)
- outf = os.path.join(fpath,outf)
- amap = open(outf, 'w')
- res = ['%s\t%s\t0\t%s' % (chromlist[x],rslist[x],poslist[x]) for x in range(len(rslist))]
- res.append('')
- amap.write('\n'.join(res))
- amap.close()
-
-def makeMissing(genos=[], missrate = 0.03, missval = '0'):
- """impose some random missingness"""
- nsnps = len(genos)
- for snp in range(nsnps): # ignore first 6 columns
- if random.random() <= missrate:
- genos[snp] = '%s %s' % (missval,missval)
- return genos
-
-def makeTriomissing(genos=[], missrate = 0.03, missval = '0'):
- """impose some random missingness on a trio - moth eaten like real data"""
- for person in (0,1):
- nsnps = len(genos[person])
- for snp in range(nsnps):
- for person in [0,1,2]:
- if random.random() <= missrate:
- genos[person][snp] = '%s %s' % (missval,missval)
- return genos
-
-
-def makeTriomendel(p1g=(0,0),p2g=(0,0), kiddip = (0,0)):
- """impose some random mendels on a trio
- there are 8 of the 9 mating types we can simulate reasonable errors for
- Note, since random mating dual het parents can produce any genotype we can't generate an interesting
- error for them, so the overall mendel rate will be lower than mendrate, depending on
- allele frequency..."""
- if p1g[0] <> p1g[1] and p2g[0] <> p2g[1]: # both parents het
- return kiddip # cannot simulate a mendel error - anything is legal!
- elif (p1g[0] <> p1g[1]): # p1 is het parent so p2 must be hom
- if p2g[0] == 0: # - make child p2 opposite hom for error
- kiddip = (1,1)
- else:
- kiddip = (0,0)
- elif (p2g[0] <> p2g[1]): # p2 is het parent so p1 must be hom
- if p1g[0] == 0: # - make child p1 opposite hom for error
- kiddip = (1,1)
- else:
- kiddip = (0,0)
- elif (p1g[0] == p1g[1]): # p1 is hom parent and if we get here p2 must also be hom
- if p1g[0] == p2g[0]: # both parents are same hom - make child either het or opposite hom for error
- if random.random() <= 0.5:
- kiddip = (0,1)
- else:
- if p1g[0] == 0:
- kiddip = (1,1)
- else:
- kiddip = (0,0)
- else: # parents are opposite hom - return any hom as an error
- if random.random() <= 0.5:
- kiddip = (0,0)
- else:
- kiddip = (1,1)
- return kiddip
-
-
-
-
-def makeFam(width=100, freqs={}, alleles={}, trio=1, missrate=0.03, missval='0', mendrate=0.0):
- """this family is a simple trio, constructed by random mating two random genotypes
- TODO: why not generate from chromosomes - eg hapmap
- set each haplotype locus according to the conditional
- probability implied by the surrounding loci - eg use both neighboring pairs, triplets
- and quads as observed in hapmap ceu"""
- dadped = '%d 1 0 0 1 1 %s'
- mumped = '%d 2 0 0 2 1 %s' # a mother is a mum where I come from :)
- kidped = '%d 3 1 2 %d %d %s'
- family = [] # result accumulator
- sex = random.choice((1,2)) # for the kid
- affected = random.choice((1,2))
- genos = [[],[],[]] # dad, mum, kid - 0/1 for common,rare initially, then xform to alleles
- # parent1...kidn lists of 0/1 for common,rare initially, then xformed to alleles
- for snp in xrange(width):
- f = freqs[snp]
- for i in range(2): # do dad and mum
- p = random.random()
- a1 = a2 = 0
- if p <= f: # a rare allele
- a1 = 1
- p = random.random()
- if p <= f: # a rare allele
- a2 = 1
- if a1 > a2:
- a1,a2 = a2,a1 # so ordering consistent - 00,01,11
- dip = (a1,a2)
- genos[i].append(dip) # tuples of 0,1
- a1 = random.choice(genos[0][snp]) # dad gamete
- a2 = random.choice(genos[1][snp]) # mum gamete
- if a1 > a2:
- a1,a2 = a2,a1 # so ordering consistent - 00,01,11
- kiddip = (a1,a2) # NSFW mating!
- genos[2].append(kiddip)
- if mendrate > 0:
- if random.random() <= mendrate:
- genos[2][snp] = makeTriomendel(genos[0][snp],genos[1][snp], kiddip)
- achoice = alleles[snp]
- for g in genos: # now convert to alleles using allele dict
- a1 = achoice[g[snp][0]] # get allele letter
- a2 = achoice[g[snp][1]]
- g[snp] = '%s %s' % (a1,a2)
- if missrate > 0:
- genos = makeTriomissing(genos=genos,missrate=missrate, missval=missval)
- family.append(dadped % (trio,' '.join(genos[0]))) # create a row for each member of trio
- family.append(mumped % (trio,' '.join(genos[1])))
- family.append(kidped % (trio,sex,affected,' '.join(genos[2])))
- return family
-
-def makePerson(width=100, aff=1, freqs={}, alleles={}, id=1, missrate = 0.03, missval='0'):
- """make an entire genotype vector for an independent subject"""
- sex = random.choice((1,2))
- if not aff:
- aff = random.choice((1,2))
- genos = [] #0/1 for common,rare initially, then xform to alleles
- family = []
- personped = '%d 1 0 0 %d %d %s'
- poly = (0,1)
- for snp in xrange(width):
- achoice = alleles[snp]
- f = freqs[snp]
- p = random.random()
- a1 = a2 = 0
- if p <= f: # a rare allele
- a1 = 1
- p = random.random()
- if p <= f: # a rare allele
- a2 = 1
- if a1 > a2:
- a1,a2 = a2,a1 # so ordering consistent - 00,01,11
- a1 = achoice[a1] # get allele letter
- a2 = achoice[a2]
- g = '%s %s' % (a1,a2)
- genos.append(g)
- if missrate > 0.0:
- genos = makeMissing(genos=genos,missrate=missrate, missval=missval)
- family.append(personped % (id,sex,aff,' '.join(genos)))
- return family
-
-def makeHapmap(fprefix= 'fakebigped',width=100, aff=[], freqs={},
- alleles={}, nsubj = 2000, trios = True, mendrate=0.03, missrate = 0.03, missval='0'):
- """ fake a hapmap file and a pedigree file for eg haploview
- this is arranged as the transpose of a ped file - cols are subjects, rows are markers
- so we need to generate differently since we can't do the transpose in ram reliably for
- a few billion genotypes...
- """
- outheadprefix = 'rs# alleles chrom pos strand assembly# center protLSID assayLSID panelLSID QCcode %s'
- cfake5 = ["illumina","urn:LSID:illumina.hapmap.org:Protocol:Golden_Gate_1.0.0:1",
-"urn:LSID:illumina.hapmap.org:Assay:27741:1","urn:lsid:dcc.hapmap.org:Panel:CEPH-30-trios:1","QC+"]
- yfake5 = ["illumina","urn:LSID:illumina.hapmap.org:Protocol:Golden_Gate_1.0.0:1",
-"urn:LSID:illumina.hapmap.org:Assay:27741:1","urn:LSID:dcc.hapmap.org:Panel:Yoruba-30-trios:1","QC+"]
- sampids = ids
- if trios:
- ts = '%d trios' % int(nsubj/3.)
- else:
- ts = '%d unrelated subjects' % nsubj
- res = ['#%s fake hapmap file %d snps and %s, faked by %s' % (timenow(), width, ts, prog),]
- res.append('# ross lazarus me fecit')
- res.append(outheadprefix % ' '.join(sampids)) # make a header compatible with hapmap extracts
- outf = open('%s.hmap' % (fprefix), 'w')
- started = time.time()
- if trios:
- ntrios = int(nsubj/3.)
- for n in ntrios: # each is a dict
- row = copy.copy(cfake5) # get first fields
- row = map(str,row)
- if race == "YRI":
- row += yfake5
- elif race == 'CEU':
- row += cfake5
- else:
- row += ['NA' for x in range(5)] # 5 dummy fields = center protLSID assayLSID panelLSID QCcode
- row += [''.join(sorted(line[x])) for x in sampids] # the genotypes in header (sorted) sample id order
- res.append(' '.join(row))
- res.append('')
- outfname = '%s_%s_%s_%dkb.geno' % (gene,probeid,race,2*flank/1000)
- f = file(outfname,'w')
- f.write('\n'.join(res))
- f.close()
- print '### %s: Wrote %d lines to %s' % (timenow(), len(res),outfname)
-
-
+ outf = os.path.join(fpath,outf)
+ amap = open(outf, 'w')
+ res = ['%s\t%s\t0\t%s' % (chromlist[x],rslist[x],poslist[x]) for x in range(len(rslist))]
+ res.append('')
+ amap.write('\n'.join(res))
+ amap.close()
+
+def makeMissing(genos=[], missrate = 0.03, missval = '0'):
+ """impose some random missingness"""
+ nsnps = len(genos)
+ for snp in range(nsnps): # ignore first 6 columns
+ if random.random() <= missrate:
+ genos[snp] = '%s %s' % (missval,missval)
+ return genos
+
+def makeTriomissing(genos=[], missrate = 0.03, missval = '0'):
+ """impose some random missingness on a trio - moth eaten like real data"""
+ for person in (0,1):
+ nsnps = len(genos[person])
+ for snp in range(nsnps):
+ for person in [0,1,2]:
+ if random.random() <= missrate:
+ genos[person][snp] = '%s %s' % (missval,missval)
+ return genos
+
+
+def makeTriomendel(p1g=(0,0),p2g=(0,0), kiddip = (0,0)):
+ """impose some random mendels on a trio
+ there are 8 of the 9 mating types we can simulate reasonable errors for
+ Note, since random mating dual het parents can produce any genotype we can't generate an interesting
+ error for them, so the overall mendel rate will be lower than mendrate, depending on
+ allele frequency..."""
+ if p1g[0] <> p1g[1] and p2g[0] <> p2g[1]: # both parents het
+ return kiddip # cannot simulate a mendel error - anything is legal!
+ elif (p1g[0] <> p1g[1]): # p1 is het parent so p2 must be hom
+ if p2g[0] == 0: # - make child p2 opposite hom for error
+ kiddip = (1,1)
+ else:
+ kiddip = (0,0)
+ elif (p2g[0] <> p2g[1]): # p2 is het parent so p1 must be hom
+ if p1g[0] == 0: # - make child p1 opposite hom for error
+ kiddip = (1,1)
+ else:
+ kiddip = (0,0)
+ elif (p1g[0] == p1g[1]): # p1 is hom parent and if we get here p2 must also be hom
+ if p1g[0] == p2g[0]: # both parents are same hom - make child either het or opposite hom for error
+ if random.random() <= 0.5:
+ kiddip = (0,1)
+ else:
+ if p1g[0] == 0:
+ kiddip = (1,1)
+ else:
+ kiddip = (0,0)
+ else: # parents are opposite hom - return any hom as an error
+ if random.random() <= 0.5:
+ kiddip = (0,0)
+ else:
+ kiddip = (1,1)
+ return kiddip
+
+
+
+
+def makeFam(width=100, freqs={}, alleles={}, trio=1, missrate=0.03, missval='0', mendrate=0.0):
+ """this family is a simple trio, constructed by random mating two random genotypes
+ TODO: why not generate from chromosomes - eg hapmap
+ set each haplotype locus according to the conditional
+ probability implied by the surrounding loci - eg use both neighboring pairs, triplets
+ and quads as observed in hapmap ceu"""
+ dadped = '%d 1 0 0 1 1 %s'
+ mumped = '%d 2 0 0 2 1 %s' # a mother is a mum where I come from :)
+ kidped = '%d 3 1 2 %d %d %s'
+ family = [] # result accumulator
+ sex = random.choice((1,2)) # for the kid
+ affected = random.choice((1,2))
+ genos = [[],[],[]] # dad, mum, kid - 0/1 for common,rare initially, then xform to alleles
+ # parent1...kidn lists of 0/1 for common,rare initially, then xformed to alleles
+ for snp in xrange(width):
+ f = freqs[snp]
+ for i in range(2): # do dad and mum
+ p = random.random()
+ a1 = a2 = 0
+ if p <= f: # a rare allele
+ a1 = 1
+ p = random.random()
+ if p <= f: # a rare allele
+ a2 = 1
+ if a1 > a2:
+ a1,a2 = a2,a1 # so ordering consistent - 00,01,11
+ dip = (a1,a2)
+ genos[i].append(dip) # tuples of 0,1
+ a1 = random.choice(genos[0][snp]) # dad gamete
+ a2 = random.choice(genos[1][snp]) # mum gamete
+ if a1 > a2:
+ a1,a2 = a2,a1 # so ordering consistent - 00,01,11
+ kiddip = (a1,a2) # NSFW mating!
+ genos[2].append(kiddip)
+ if mendrate > 0:
+ if random.random() <= mendrate:
+ genos[2][snp] = makeTriomendel(genos[0][snp],genos[1][snp], kiddip)
+ achoice = alleles[snp]
+ for g in genos: # now convert to alleles using allele dict
+ a1 = achoice[g[snp][0]] # get allele letter
+ a2 = achoice[g[snp][1]]
+ g[snp] = '%s %s' % (a1,a2)
+ if missrate > 0:
+ genos = makeTriomissing(genos=genos,missrate=missrate, missval=missval)
+ family.append(dadped % (trio,' '.join(genos[0]))) # create a row for each member of trio
+ family.append(mumped % (trio,' '.join(genos[1])))
+ family.append(kidped % (trio,sex,affected,' '.join(genos[2])))
+ return family
+
+def makePerson(width=100, aff=1, freqs={}, alleles={}, id=1, missrate = 0.03, missval='0'):
+ """make an entire genotype vector for an independent subject"""
+ sex = random.choice((1,2))
+ if not aff:
+ aff = random.choice((1,2))
+ genos = [] #0/1 for common,rare initially, then xform to alleles
+ family = []
+ personped = '%d 1 0 0 %d %d %s'
+ poly = (0,1)
+ for snp in xrange(width):
+ achoice = alleles[snp]
+ f = freqs[snp]
+ p = random.random()
+ a1 = a2 = 0
+ if p <= f: # a rare allele
+ a1 = 1
+ p = random.random()
+ if p <= f: # a rare allele
+ a2 = 1
+ if a1 > a2:
+ a1,a2 = a2,a1 # so ordering consistent - 00,01,11
+ a1 = achoice[a1] # get allele letter
+ a2 = achoice[a2]
+ g = '%s %s' % (a1,a2)
+ genos.append(g)
+ if missrate > 0.0:
+ genos = makeMissing(genos=genos,missrate=missrate, missval=missval)
+ family.append(personped % (id,sex,aff,' '.join(genos)))
+ return family
+
+def makeHapmap(fprefix= 'fakebigped',width=100, aff=[], freqs={},
+ alleles={}, nsubj = 2000, trios = True, mendrate=0.03, missrate = 0.03, missval='0'):
+ """ fake a hapmap file and a pedigree file for eg haploview
+ this is arranged as the transpose of a ped file - cols are subjects, rows are markers
+ so we need to generate differently since we can't do the transpose in ram reliably for
+ a few billion genotypes...
+ """
+ outheadprefix = 'rs# alleles chrom pos strand assembly# center protLSID assayLSID panelLSID QCcode %s'
+ cfake5 = ["illumina","urn:LSID:illumina.hapmap.org:Protocol:Golden_Gate_1.0.0:1",
+"urn:LSID:illumina.hapmap.org:Assay:27741:1","urn:lsid:dcc.hapmap.org:Panel:CEPH-30-trios:1","QC+"]
+ yfake5 = ["illumina","urn:LSID:illumina.hapmap.org:Protocol:Golden_Gate_1.0.0:1",
+"urn:LSID:illumina.hapmap.org:Assay:27741:1","urn:LSID:dcc.hapmap.org:Panel:Yoruba-30-trios:1","QC+"]
+ sampids = ids
+ if trios:
+ ts = '%d trios' % int(nsubj/3.)
+ else:
+ ts = '%d unrelated subjects' % nsubj
+ res = ['#%s fake hapmap file %d snps and %s, faked by %s' % (timenow(), width, ts, prog),]
+ res.append('# ross lazarus me fecit')
+ res.append(outheadprefix % ' '.join(sampids)) # make a header compatible with hapmap extracts
+ outf = open('%s.hmap' % (fprefix), 'w')
+ started = time.time()
+ if trios:
+ ntrios = int(nsubj/3.)
+ for n in ntrios: # each is a dict
+ row = copy.copy(cfake5) # get first fields
+ row = map(str,row)
+ if race == "YRI":
+ row += yfake5
+ elif race == 'CEU':
+ row += cfake5
+ else:
+ row += ['NA' for x in range(5)] # 5 dummy fields = center protLSID assayLSID panelLSID QCcode
+ row += [''.join(sorted(line[x])) for x in sampids] # the genotypes in header (sorted) sample id order
+ res.append(' '.join(row))
+ res.append('')
+ outfname = '%s_%s_%s_%dkb.geno' % (gene,probeid,race,2*flank/1000)
+ f = file(outfname,'w')
+ f.write('\n'.join(res))
+ f.close()
+ print '### %s: Wrote %d lines to %s' % (timenow(), len(res),outfname)
+
+
def makePed(fprefix= 'fakebigped', fpath='./',
- width=500000, nsubj=2000, MAFdistribution=[],alleles={},
- freqs={}, fbatstyle=True, mendrate = 0.0, missrate = 0.03, missval='0',fbathead=''):
- """fake trios with mendel consistent random mating genotypes in offspring
- with consistent alleles and MAFs for the sample"""
- res = []
- if fbatstyle: # add a header row with the marker names
+ width=500000, nsubj=2000, MAFdistribution=[],alleles={},
+ freqs={}, fbatstyle=True, mendrate = 0.0, missrate = 0.03, missval='0',fbathead=''):
+ """fake trios with mendel consistent random mating genotypes in offspring
+ with consistent alleles and MAFs for the sample"""
+ res = []
+ if fbatstyle: # add a header row with the marker names
res.append(fbathead) # header row for fbat
outfname = '%s.ped'% (fprefix)
outfname = os.path.join(fpath,outfname)
- outf = open(outfname,'w')
- ntrios = int(nsubj/3.)
- outf = open(outfile, 'w')
- started = time.time()
- for trio in xrange(ntrios):
- family = makeFam(width=width, freqs=freqs, alleles=alleles, trio=trio,
- missrate = missrate, mendrate=mendrate, missval=missval)
- res += family
- if (trio + 1) % 10 == 0: # write out to keep ram requirements reasonable
- if (trio + 1) % 50 == 0: # show progress
- dur = time.time() - started
- if dur == 0:
- dur = 1.0
- print 'Trio: %d, %4.1f genos/sec at %6.1f sec' % (trio + 1, width*trio*3/dur, dur)
- outf.write('\n'.join(res))
- outf.write('\n')
- res = []
- if len(res) > 0: # some left
- outf.write('\n'.join(res))
- outf.write('\n')
- outf.close()
+ outf = open(outfname,'w')
+ ntrios = int(nsubj/3.)
+ outf = open(outfile, 'w')
+ started = time.time()
+ for trio in xrange(ntrios):
+ family = makeFam(width=width, freqs=freqs, alleles=alleles, trio=trio,
+ missrate = missrate, mendrate=mendrate, missval=missval)
+ res += family
+ if (trio + 1) % 10 == 0: # write out to keep ram requirements reasonable
+ if (trio + 1) % 50 == 0: # show progress
+ dur = time.time() - started
+ if dur == 0:
+ dur = 1.0
+ print 'Trio: %d, %4.1f genos/sec at %6.1f sec' % (trio + 1, width*trio*3/dur, dur)
+ outf.write('\n'.join(res))
+ outf.write('\n')
+ res = []
+ if len(res) > 0: # some left
+ outf.write('\n'.join(res))
+ outf.write('\n')
+ outf.close()
if debug:
- print '##makeped : %6.1f seconds total runtime' % (time.time() - started)
-
+ print '##makeped : %6.1f seconds total runtime' % (time.time() - started)
+
def makeIndep(fprefix = 'fakebigped', fpath='./',
- width=500000, Nunaff=1000, Naff=1000, MAFdistribution=[],
- alleles={}, freqs={}, fbatstyle=True, missrate = 0.03, missval='0',fbathead=''):
- """fake a random sample from a random mating sample
- with consistent alleles and MAFs"""
- res = []
- Ntot = Nunaff + Naff
- status = [1,]*Nunaff
+ width=500000, Nunaff=1000, Naff=1000, MAFdistribution=[],
+ alleles={}, freqs={}, fbatstyle=True, missrate = 0.03, missval='0',fbathead=''):
+ """fake a random sample from a random mating sample
+ with consistent alleles and MAFs"""
+ res = []
+ Ntot = Nunaff + Naff
+ status = [1,]*Nunaff
status += [2,]*Nunaff
outf = '%s.ped' % (fprefix)
- outf = os.path.join(fpath,outf)
- outf = open(outf, 'w')
- started = time.time()
- #sample = personMaker(width=width, affs=status, freqs=freqs, alleles=alleles, Ntomake=Ntot)
- if fbatstyle: # add a header row with the marker names
- res.append(fbathead) # header row for fbat
- for id in xrange(Ntot):
- if id < Nunaff:
- aff = 1
- else:
- aff = 2
- family = makePerson(width=width, aff=aff, freqs=freqs, alleles=alleles, id=id+1)
- res += family
- if (id % 50 == 0): # write out to keep ram requirements reasonable
- if (id % 200 == 0): # show progress
- dur = time.time() - started
- if dur == 0:
- dur = 1.0
- print 'Id: %d, %4.1f genos/sec at %6.1f sec' % (id, width*id/dur, dur)
- outf.write('\n'.join(res))
- outf.write('\n')
- res = []
- if len(res) > 0: # some left
- outf.write('\n'.join(res))
- outf.write('\n')
- outf.close()
- print '## makeindep: %6.1f seconds total runtime' % (time.time() - started)
-
-u = """
-Generate either trios or independent subjects with a prespecified
-number of random alleles and a uniform or triangular MAF distribution for
-stress testing. No LD is simulated - alleles are random. Offspring for
-trios are generated by random mating the random parental alleles so there are
-no Mendelian errors unless the -M option is used. Mendelian errors are generated
-randomly according to the possible errors given the parental mating type although
-this is fresh code and not guaranteed to work quite right yet - comments welcomed
-
-Enquiries to ross.lazarus(a)gmail.com
-
-eg to generate 700 trios with 500k snps, use:
-fakebigped.py -n 2100 -s 500000
-or to generate 500 independent cases and 500 controls with 100k snps and 0.02 missingness (MCAR), use:
-fakebigped.py -c 500 -n 1000 -s 100000 -m 0.02
-
-fakebigped.py -o myfake -m 0.05 -s 100000 -n 2000
-will make fbat compatible myfake.ped with 100k markers in
-666 trios (total close to 2000 subjects), a uniform MAF distribution and about 5% MCAR missing
-
-fakebigped.py -o myfake -m 0.05 -s 100000 -n 2000 -M 0.05
-will make fbat compatible myfake.ped with 100k markers in
-666 trios (total close to 2000 subjects), a uniform MAF distribution,
-about 5% Mendelian errors and about 5% MCAR missing
-
-
-fakebigped.py -o myfakecc -m 0.05 -s 100000 -n 2000 -c 1000 -l
-will make plink compatible myfakecc.ped and myfakecc.map (that's what the -l option does),
-with 100k markers in 1000 cases and 1000 controls (affection status 2 and 1 respectively),
-a triangular MAF distribution (more rare alleles) and about 5% MCAR missing
-
-You should see about 1/4 million genotypes/second so about an hour for a
-500k snps in 2k subjects and about a 4GB ped file - these are BIG!!
-
-"""
-
-import sys, os, glob
-
-galhtmlprefix = """<?xml version="1.0" encoding="utf-8" ?>
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
-<head>
-<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
-<meta name="generator" content="Galaxy %s tool output - see http://g2.trac.bx.psu.edu/" />
-<title></title>
-<link rel="stylesheet" href="/static/style/base.css" type="text/css" />
-</head>
-<body>
-<div class="document">
-"""
-
-
-def doImport(outfile=None,outpath=None):
- """ import into one of the new html composite data types for Rgenetics
- Dan Blankenberg with mods by Ross Lazarus
- October 2007
- """
- flist = glob.glob(os.path.join(outpath,'*'))
- outf = open(outfile,'w')
- outf.write(galhtmlprefix % prog)
- for i, data in enumerate( flist ):
- outf.write('<li><a href="%s">%s</a></li>\n' % (os.path.split(data)[-1],os.path.split(data)[-1]))
+ outf = os.path.join(fpath,outf)
+ outf = open(outf, 'w')
+ started = time.time()
+ #sample = personMaker(width=width, affs=status, freqs=freqs, alleles=alleles, Ntomake=Ntot)
+ if fbatstyle: # add a header row with the marker names
+ res.append(fbathead) # header row for fbat
+ for id in xrange(Ntot):
+ if id < Nunaff:
+ aff = 1
+ else:
+ aff = 2
+ family = makePerson(width=width, aff=aff, freqs=freqs, alleles=alleles, id=id+1)
+ res += family
+ if (id % 50 == 0): # write out to keep ram requirements reasonable
+ if (id % 200 == 0): # show progress
+ dur = time.time() - started
+ if dur == 0:
+ dur = 1.0
+ print 'Id: %d, %4.1f genos/sec at %6.1f sec' % (id, width*id/dur, dur)
+ outf.write('\n'.join(res))
+ outf.write('\n')
+ res = []
+ if len(res) > 0: # some left
+ outf.write('\n'.join(res))
+ outf.write('\n')
+ outf.close()
+ print '## makeindep: %6.1f seconds total runtime' % (time.time() - started)
+
+u = """
+Generate either trios or independent subjects with a prespecified
+number of random alleles and a uniform or triangular MAF distribution for
+stress testing. No LD is simulated - alleles are random. Offspring for
+trios are generated by random mating the random parental alleles so there are
+no Mendelian errors unless the -M option is used. Mendelian errors are generated
+randomly according to the possible errors given the parental mating type although
+this is fresh code and not guaranteed to work quite right yet - comments welcomed
+
+Enquiries to ross.lazarus(a)gmail.com
+
+eg to generate 700 trios with 500k snps, use:
+fakebigped.py -n 2100 -s 500000
+or to generate 500 independent cases and 500 controls with 100k snps and 0.02 missingness (MCAR), use:
+fakebigped.py -c 500 -n 1000 -s 100000 -m 0.02
+
+fakebigped.py -o myfake -m 0.05 -s 100000 -n 2000
+will make fbat compatible myfake.ped with 100k markers in
+666 trios (total close to 2000 subjects), a uniform MAF distribution and about 5% MCAR missing
+
+fakebigped.py -o myfake -m 0.05 -s 100000 -n 2000 -M 0.05
+will make fbat compatible myfake.ped with 100k markers in
+666 trios (total close to 2000 subjects), a uniform MAF distribution,
+about 5% Mendelian errors and about 5% MCAR missing
+
+
+fakebigped.py -o myfakecc -m 0.05 -s 100000 -n 2000 -c 1000 -l
+will make plink compatible myfakecc.ped and myfakecc.map (that's what the -l option does),
+with 100k markers in 1000 cases and 1000 controls (affection status 2 and 1 respectively),
+a triangular MAF distribution (more rare alleles) and about 5% MCAR missing
+
+You should see about 1/4 million genotypes/second so about an hour for a
+500k snps in 2k subjects and about a 4GB ped file - these are BIG!!
+
+"""
+
+import sys, os, glob
+
+galhtmlprefix = """<?xml version="1.0" encoding="utf-8" ?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="Galaxy %s tool output - see http://g2.trac.bx.psu.edu/" />
+<title></title>
+<link rel="stylesheet" href="/static/style/base.css" type="text/css" />
+</head>
+<body>
+<div class="document">
+"""
+
+
+def doImport(outfile=None,outpath=None):
+ """ import into one of the new html composite data types for Rgenetics
+ Dan Blankenberg with mods by Ross Lazarus
+ October 2007
+ """
+ flist = glob.glob(os.path.join(outpath,'*'))
+ outf = open(outfile,'w')
+ outf.write(galhtmlprefix % prog)
+ for i, data in enumerate( flist ):
+ outf.write('<li><a href="%s">%s</a></li>\n' % (os.path.split(data)[-1],os.path.split(data)[-1]))
outf.write('<br><h3>This is simulated null genotype data generated by Rgenetics!</h3>')
outf.write('%s called with command line:<br><pre>' % prog)
outf.write(' '.join(sys.argv))
- outf.write('\n</pre>\n')
- outf.write("</div></body></html>")
- outf.close()
-
-
-
-if __name__ == "__main__":
- """
- """
- parser = OptionParser(usage=u, version="%prog 0.01")
- a = parser.add_option
- a("-n","--nsubjects",type="int",dest="Ntot",
- help="nsubj: total number of subjects",default=2000)
- a("-t","--title",dest="title",
- help="title: file basename for outputs",default='fakeped')
- a("-c","--cases",type="int",dest="Naff",
- help="number of cases: independent subjects with status set to 2 (ie cases). If not set, NTOT/3 trios will be generated", default = 0)
- a("-s","--snps",dest="width",type="int",
- help="snps: total number of snps per subject", default=1000)
- a("-d","--distribution",dest="MAFdist",default="Uniform",
- help="MAF distribution - default is Uniform, can be Triangular")
- a("-o","--outf",dest="outf",
- help="Output file", default = 'fakeped')
- a("-p","--outpath",dest="outpath",
- help="Path for output files", default = './')
- a("-l","--pLink",dest="outstyle", default='L',
- help="Ped files as for Plink - no header, separate Map file - default is Plink style")
- a("-w","--loWmaf", type="float", dest="lowmaf", default=0.01, help="Lower limit for SNP MAF (minor allele freq)")
- a("-m","--missing",dest="missrate",type="float",
- help="missing: probability of missing MCAR - default 0.0", default=0.0)
- a("-v","--valmiss",dest="missval",
- help="missing character: Missing allele code - usually 0 or N - default 0", default="0")
- a("-M","--Mendelrate",dest="mendrate",type="float",
- help="Mendelian error rate: probability of a mendel error per trio, default=0.0", default=0.0)
- a("-H","--noHGRS",dest="useHG",type="int",
- help="Use local copy of UCSC snp126 database to generate real rs numbers", default=True)
- (options,args) = parser.parse_args()
- low = options.lowmaf
+ outf.write('\n</pre>\n')
+ outf.write("</div></body></html>")
+ outf.close()
+
+
+
+if __name__ == "__main__":
+ """
+ """
+ parser = OptionParser(usage=u, version="%prog 0.01")
+ a = parser.add_option
+ a("-n","--nsubjects",type="int",dest="Ntot",
+ help="nsubj: total number of subjects",default=2000)
+ a("-t","--title",dest="title",
+ help="title: file basename for outputs",default='fakeped')
+ a("-c","--cases",type="int",dest="Naff",
+ help="number of cases: independent subjects with status set to 2 (ie cases). If not set, NTOT/3 trios will be generated", default = 0)
+ a("-s","--snps",dest="width",type="int",
+ help="snps: total number of snps per subject", default=1000)
+ a("-d","--distribution",dest="MAFdist",default="Uniform",
+ help="MAF distribution - default is Uniform, can be Triangular")
+ a("-o","--outf",dest="outf",
+ help="Output file", default = 'fakeped')
+ a("-p","--outpath",dest="outpath",
+ help="Path for output files", default = './')
+ a("-l","--pLink",dest="outstyle", default='L',
+ help="Ped files as for Plink - no header, separate Map file - default is Plink style")
+ a("-w","--loWmaf", type="float", dest="lowmaf", default=0.01, help="Lower limit for SNP MAF (minor allele freq)")
+ a("-m","--missing",dest="missrate",type="float",
+ help="missing: probability of missing MCAR - default 0.0", default=0.0)
+ a("-v","--valmiss",dest="missval",
+ help="missing character: Missing allele code - usually 0 or N - default 0", default="0")
+ a("-M","--Mendelrate",dest="mendrate",type="float",
+ help="Mendelian error rate: probability of a mendel error per trio, default=0.0", default=0.0)
+ a("-H","--noHGRS",dest="useHG",type="int",
+ help="Use local copy of UCSC snp126 database to generate real rs numbers", default=True)
+ (options,args) = parser.parse_args()
+ low = options.lowmaf
try:
os.makedirs(options.outpath)
except:
pass
- if options.MAFdist.upper() == 'U':
- mafDist = makeUniformMAFdist(low=low, high=0.5)
- else:
+ if options.MAFdist.upper() == 'U':
+ mafDist = makeUniformMAFdist(low=low, high=0.5)
+ else:
mafDist = makeTriangularMAFdist(low=low, high=0.5, beta=5)
alleles,freqs, rslist, chromlist, poslist = makeMap(width=int(options.width),
MAFdistribution=mafDist, useGP=False)
@@ -511,25 +511,25 @@
title = string.translate(options.title,trantab)
if options.outstyle == 'F':
- fbatstyle = True
- fbathead = makeFbathead(rslist=rslist, chromlist=chromlist, poslist=poslist, width=options.width)
+ fbatstyle = True
+ fbathead = makeFbathead(rslist=rslist, chromlist=chromlist, poslist=poslist, width=options.width)
else:
- fbatstyle = False
- writeMap(fprefix=title, rslist=rslist, fpath=options.outpath,
- chromlist=chromlist, poslist=poslist, width=options.width)
- if options.Naff > 0: # make case control data
- makeIndep(fprefix = title, fpath=options.outpath,
- width=options.width, Nunaff=options.Ntot-options.Naff,
- Naff=options.Naff, MAFdistribution=mafDist,alleles=alleles, freqs=freqs,
- fbatstyle=fbatstyle, missrate=options.missrate, missval=options.missval,
- fbathead=fbathead)
- else:
- makePed(fprefix=options.fprefix, fpath=options.fpath,
- width=options.width, MAFdistribution=mafDist, nsubj=options.Ntot,
- alleles=alleles, freqs=freqs, fbatstyle=fbatstyle, missrate=options.missrate,
- mendrate=options.mendrate, missval=options.missval,
+ fbatstyle = False
+ writeMap(fprefix=defbasename, rslist=rslist, fpath=options.outpath,
+ chromlist=chromlist, poslist=poslist, width=options.width)
+ if options.Naff > 0: # make case control data
+ makeIndep(fprefix = defbasename, fpath=options.outpath,
+ width=options.width, Nunaff=options.Ntot-options.Naff,
+ Naff=options.Naff, MAFdistribution=mafDist,alleles=alleles, freqs=freqs,
+ fbatstyle=fbatstyle, missrate=options.missrate, missval=options.missval,
fbathead=fbathead)
- doImport(outfile=options.outf,outpath=options.outpath)
-
-
-
+ else:
+ makePed(fprefix=defbasename, fpath=options.fpath,
+ width=options.width, MAFdistribution=mafDist, nsubj=options.Ntot,
+ alleles=alleles, freqs=freqs, fbatstyle=fbatstyle, missrate=options.missrate,
+ mendrate=options.mendrate, missval=options.missval,
+ fbathead=fbathead)
+ doImport(outfile=options.outf,outpath=options.outpath)
+
+
+
--- a/tools/rgenetics/rgfakePed.xml Fri May 13 21:24:03 2011 -0400
+++ b/tools/rgenetics/rgfakePed.xml Mon May 16 09:12:19 2011 -0400
@@ -1,14 +1,13 @@
-<tool id="rgfakePed1" name="Null genotypes">
+<tool id="rgfakePed1" name="Null genotypes" version="0.02"><description>for testing</description><command interpreter="python">rgfakePed.py --title '$title'
-o '$out_file1' -p '$out_file1.files_path' -c '$ncases' -n '$ntotal'
-s '$nsnp' -w '$lowmaf' -v '$missingValue' -l '$outFormat'
-d '$mafdist' -m '$missingRate' -M '$mendelRate' </command><inputs>
- <page><param name="title"
- type="text"
+ type="text" value="Fake_test_geno_data"
help="Name for outputs from this job"
label="Descriptive short name"/><param name="ntotal"
@@ -55,11 +54,10 @@
help = "Missing allele value"
label="Missing value for an allele for the output ped file"/>
- </page></inputs><outputs>
- <data format="lped" name="out_file1" label="${title}.lped" />
+ <data format="lped" name="out_file1" label="${title}.lped"/></outputs><tests><test>
@@ -74,8 +72,8 @@
<param name="mendelRate" value="0" /><param name="missingValue" value="0" /><output name='out_file1' file='rgtestouts/rgfakePed/rgfakePedtest1.lped' ftype='lped' compare="diff" lines_diff='5'>
- <extra_files type="file" name='rgfakePedtest1.ped' value="rgtestouts/rgfakePed/rgfakePedtest1.ped" compare="diff" lines_diff='80'/>
- <extra_files type="file" name='rgfakePedtest1.map' value="rgtestouts/rgfakePed/rgfakePedtest1.map" compare="diff" />
+ <extra_files type="file" name='RgeneticsData.ped' value="rgtestouts/rgfakePed/rgfakePedtest1.ped" compare="diff" lines_diff='80'/>
+ <extra_files type="file" name='RgeneticsData.map' value="rgtestouts/rgfakePed/rgfakePedtest1.map" compare="diff" /></output></test></tests>
@@ -100,11 +98,15 @@
This tool is very experimental
-**Attribution**
+.. class:: infomark
+
+**Attribution and Licensing**
+
Designed and written for the Rgenetics Galaxy tools
copyright Ross Lazarus 2007 (ross.lazarus(a)gmail.com)
-Licensed under the terms of the LGPL
-as documented http://www.gnu.org/licenses/lgpl.html
+Licensed under the terms of the _LGPL
+
+ .. _LGPL: http://www.gnu.org/copyleft/lesser.html
</help></tool>
http://bitbucket.org/galaxy/galaxy-central/changeset/2790f54a4fe7/
changeset: r5566:2790f54a4fe7
user: fubar
date: 2011-05-16 16:03:47
summary: removed python2.4 from rgfakePed.py
affected #: 1 file (161 bytes)
--- a/tools/rgenetics/rgfakePed.py Mon May 16 09:12:19 2011 -0400
+++ b/tools/rgenetics/rgfakePed.py Mon May 16 10:03:47 2011 -0400
@@ -1,4 +1,6 @@
-#! /usr/local/bin/python2.4
+# modified may 2011 to name components (map/ped) as RgeneticsData to align with default base_name
+# otherwise downstream tools fail
+# modified march 2011 to remove post execution hook
# pedigree data faker
# specifically designed for scalability testing of
# Shaun Purcel's PLINK package
Repository URL: https://bitbucket.org/galaxy/galaxy-central/
--
This is a commit notification from bitbucket.org. You are receiving
this because you have the service enabled, addressing the recipient of
this email.