Abhi,
The alignment manipulation tools in Galaxy are based on command line
programs available as part of bx-python:
http://bx-python.trac.bx.psu.edu
If you are going to be repeatedly extracting various features from a
set of alignments, your best bet is to index them first. To index a
maf file:
maf_build_index.py -s hg18 chrX.maf
(passing the -s option just produces indexes for a particular species,
saving time and disk space, if you omit this option your indexes will
allow extracting relative to coordinates on all 6 species).
This will build an index in "chrX.maf.index"
Then to extract:
cat my_regions.bed | maf_extract_ranges_indexed.py -p hg18. chrX.maf
out.maf
In this case I assume your bed file's first column contains values
like "chrX". The -p option adds a prefix to these, creating
"hg18.chrX" to match the appropriate src in the maf file.
You can pass any number of maf files to maf_extract_ranges_indexed.py,
so if your bed file has regions across the whole genome, you could do
something like:
cat my_regions.bed | maf_extract_ranges_indexed.py -p hg18. /
directory/of/genome/wide/mafs/*.maf
out.maf
See the help for more information:
james@uninvisible% maf_extract_ranges_indexed.py -
h
~
/projects/bx-python/code/bx-python-trunk
Usage: maf_extract_ranges_indexed.py maf_fname1 maf_fname2 ...
[options] < interval_file
Options:
-h, --help show this help message and exit
-m MINCOLS, --mincols=MINCOLS
Minimum length (columns) required for
alignment to be
output
-c, --chop Should blocks be chopped to only portion
overlapping
(no by default)
-s SRC, --src=SRC Use this src for all intervals
-p PREFIX, --prefix=PREFIX
Prepend this to each src before lookup
-d DIR, --dir=DIR Write each interval as a separate file in this
directory
-S, --strand Strand is included as an additional column,
and the
blocks are reverse complemented (if
necessary) so that
they are always on that strand w/r/t the src
species.
-C, --usecache Use a cache that keeps blocks of the MAF
files in
memory (requires ~20MB per MAF)
Let me know if you have other questions.
-- jt
On Jan 23, 2009, at 8:28 AM, Abhi Ratnakumar wrote:
Hello,
I have multiple sequence alignment files in the *maf format,
containing whole genome alignments from 6 species. I'm interested in
extracting the sequence surrounding the exons of ~16,000 genes. I was
wondering if it would be possible to run the Alignment Extractors tool
in batch? Would it be possible for me to run it on the command line?
Thanks ins Advance
Abhi Ratnakumar
By the way I'm a PhD student at Uppsala University, so any executables
that you provide will be used for academic purposes only.
_______________________________________________
galaxy-dev mailing list
galaxy-dev(a)bx.psu.edu
http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-dev