The alignment manipulation tools in Galaxy are based on command line
programs available as part of bx-python:
If you are going to be repeatedly extracting various features from a
set of alignments, your best bet is to index them first. To index a
maf_build_index.py -s hg18 chrX.maf
(passing the -s option just produces indexes for a particular species,
saving time and disk space, if you omit this option your indexes will
allow extracting relative to coordinates on all 6 species).
This will build an index in "chrX.maf.index"
Then to extract:
cat my_regions.bed | maf_extract_ranges_indexed.py -p hg18. chrX.maf
In this case I assume your bed file's first column contains values
like "chrX". The -p option adds a prefix to these, creating
"hg18.chrX" to match the appropriate src in the maf file.
You can pass any number of maf files to maf_extract_ranges_indexed.py,
so if your bed file has regions across the whole genome, you could do
cat my_regions.bed | maf_extract_ranges_indexed.py -p hg18. /
See the help for more information:
james@uninvisible% maf_extract_ranges_indexed.py -
Usage: maf_extract_ranges_indexed.py maf_fname1 maf_fname2 ...
[options] < interval_file
-h, --help show this help message and exit
-m MINCOLS, --mincols=MINCOLS
Minimum length (columns) required for
alignment to be
-c, --chop Should blocks be chopped to only portion
(no by default)
-s SRC, --src=SRC Use this src for all intervals
-p PREFIX, --prefix=PREFIX
Prepend this to each src before lookup
-d DIR, --dir=DIR Write each interval as a separate file in this
-S, --strand Strand is included as an additional column,
blocks are reverse complemented (if
necessary) so that
they are always on that strand w/r/t the src
-C, --usecache Use a cache that keeps blocks of the MAF
memory (requires ~20MB per MAF)
Let me know if you have other questions.
On Jan 23, 2009, at 8:28 AM, Abhi Ratnakumar wrote:
I have multiple sequence alignment files in the *maf format,
containing whole genome alignments from 6 species. I'm interested in
extracting the sequence surrounding the exons of ~16,000 genes. I was
wondering if it would be possible to run the Alignment Extractors tool
in batch? Would it be possible for me to run it on the command line?
Thanks ins Advance
By the way I'm a PhD student at Uppsala University, so any executables
that you provide will be used for academic purposes only.
galaxy-dev mailing list