Alignment Extractors - in batch?
Hello, I have multiple sequence alignment files in the *maf format, containing whole genome alignments from 6 species. I'm interested in extracting the sequence surrounding the exons of ~16,000 genes. I was wondering if it would be possible to run the Alignment Extractors tool in batch? Would it be possible for me to run it on the command line? Thanks ins Advance Abhi Ratnakumar By the way I'm a PhD student at Uppsala University, so any executables that you provide will be used for academic purposes only.
Abhi, The alignment manipulation tools in Galaxy are based on command line programs available as part of bx-python: http://bx-python.trac.bx.psu.edu If you are going to be repeatedly extracting various features from a set of alignments, your best bet is to index them first. To index a maf file: maf_build_index.py -s hg18 chrX.maf (passing the -s option just produces indexes for a particular species, saving time and disk space, if you omit this option your indexes will allow extracting relative to coordinates on all 6 species). This will build an index in "chrX.maf.index" Then to extract: cat my_regions.bed | maf_extract_ranges_indexed.py -p hg18. chrX.maf
out.maf
In this case I assume your bed file's first column contains values like "chrX". The -p option adds a prefix to these, creating "hg18.chrX" to match the appropriate src in the maf file. You can pass any number of maf files to maf_extract_ranges_indexed.py, so if your bed file has regions across the whole genome, you could do something like: cat my_regions.bed | maf_extract_ranges_indexed.py -p hg18. / directory/of/genome/wide/mafs/*.maf > out.maf See the help for more information: james@uninvisible% maf_extract_ranges_indexed.py - h ~ /projects/bx-python/code/bx-python-trunk Usage: maf_extract_ranges_indexed.py maf_fname1 maf_fname2 ... [options] < interval_file Options: -h, --help show this help message and exit -m MINCOLS, --mincols=MINCOLS Minimum length (columns) required for alignment to be output -c, --chop Should blocks be chopped to only portion overlapping (no by default) -s SRC, --src=SRC Use this src for all intervals -p PREFIX, --prefix=PREFIX Prepend this to each src before lookup -d DIR, --dir=DIR Write each interval as a separate file in this directory -S, --strand Strand is included as an additional column, and the blocks are reverse complemented (if necessary) so that they are always on that strand w/r/t the src species. -C, --usecache Use a cache that keeps blocks of the MAF files in memory (requires ~20MB per MAF) Let me know if you have other questions. -- jt On Jan 23, 2009, at 8:28 AM, Abhi Ratnakumar wrote:
Hello,
I have multiple sequence alignment files in the *maf format, containing whole genome alignments from 6 species. I'm interested in extracting the sequence surrounding the exons of ~16,000 genes. I was wondering if it would be possible to run the Alignment Extractors tool in batch? Would it be possible for me to run it on the command line?
Thanks ins Advance Abhi Ratnakumar
By the way I'm a PhD student at Uppsala University, so any executables that you provide will be used for academic purposes only. _______________________________________________ galaxy-dev mailing list galaxy-dev@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-dev
It looks like the command line tool we have for this expects the mafs to be loaded into a UCSC style database. I have a feeling that the galaxy tools that James Taylor mentions will be easier to use in your context. On Jan 23, 2009, at 5:28 AM, Abhi Ratnakumar wrote:
Hello,
I have multiple sequence alignment files in the *maf format, containing whole genome alignments from 6 species. I'm interested in extracting the sequence surrounding the exons of ~16,000 genes. I was wondering if it would be possible to run the Alignment Extractors tool in batch? Would it be possible for me to run it on the command line?
Thanks ins Advance Abhi Ratnakumar
By the way I'm a PhD student at Uppsala University, so any executables that you provide will be used for academic purposes only.
participants (3)
-
Abhi Ratnakumar
-
James Taylor
-
Jim Kent