[galaxy-dev] Galaxy megablastdb indexer script proposal

24 Jun 2011

      Hi Galaxy Dev,

Nice to meet you.

My name is Jacob and I am from México , Actually I'm working in LANGEBIO 
(Laboratorio Nacional de Genomica para la Biodiversidad) at Bioinformatics.

I would like to contribute with galaxy dev team, I developed shell 
script to set up reference genomes by megablast.

the linux shell script that I prepared uses your current prepdb.py script.

the mini pipeline flow-work:

1. read input values:
* fasta filename (nucleotide)
* blast format: blast format: F->nucleotide, T->protein
* database output name
* fasta sequence IDs format (ncbi-gbk , other)

where:

ncbi-gbk means something like ">gi|gi-number|gb|accesión|locus"
other means something like ">my own IDs"

2. if fasta sequence IDs are like ncbi format(ncbi-gk) then execute 
prepdb.py python script and generate new fasta file then go step 4
example:
...
gi|gi-number|gb|accesión|locus
3. if fasta sequence IDs has diferent (other) identifiers set to galaxy 
format first then execute prepdb.py python script then go step 4
...
my own identifier
4. to format database using formatdb blast utility

5. setting up Megablast Galaxy Loc File

5. delete temp files

6. done

Usage example:

query help:
./set_galaxy_blastdb.sh -help

format blast nucleotide dabatabase yeast, set galaxy loc file
./set_galaxy_blastdb.sh -fasta 
/media/xlinux/genomes/galaxy/palomero/megablast/yeast_nt.fasta -p F -n 
yeastdb -fastadesc ncbi-gk

when script finished we have following results:

$ls /media/xlinux/genomes/galaxy/palomero/megablast/
blastdb.loc  new.yeast_nt.fasta  yeastdb.nhr  yeastdb.nin  yeastdb.nsq  
yeast_nt.fasta

How can I contribute with this shell script ?

Thanks in advance

--
Jacob
http://www.langebio.cinvestav.mx/bioinformatica/jacob

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

[galaxy-dev] Galaxy megablastdb indexer script proposal

Jacob Israel Cervantes Luevano (LANGEBIO)