Hi Galaxy Dev,

Nice to meet you.

My name is Jacob and I am from México , Actually I'm working in LANGEBIO (Laboratorio Nacional de Genomica para la Biodiversidad) at Bioinformatics.

I would like to contribute with galaxy dev team, I developed shell script to set up reference genomes by megablast.

the linux shell script that I prepared uses your current prepdb.py script.

the mini pipeline flow-work:

1. read input values:
* fasta filename (nucleotide)
* blast format: blast format: F->nucleotide, T->protein
* database output name
* fasta sequence IDs format (ncbi-gbk , other)

where:

ncbi-gbk means something like ">gi|gi-number|gb|accesión|locus"
other means something like ">my own IDs"

2. if fasta sequence IDs are like ncbi format(ncbi-gk) then execute prepdb.py python script and generate new fasta file then go step 4
example:

>gi|gi-number|gb|accesión|locus

3. if fasta sequence IDs has diferent (other) identifiers set to galaxy format first then execute prepdb.py python script then go step 4

>my own identifier

4. to format database using formatdb blast utility

5. setting up Megablast Galaxy Loc File

5. delete temp files

6. done

Usage example:

query help:
./set_galaxy_blastdb.sh -help

format blast nucleotide dabatabase yeast, set galaxy loc file
./set_galaxy_blastdb.sh -fasta /media/xlinux/genomes/galaxy/palomero/megablast/yeast_nt.fasta -p F -n yeastdb -fastadesc ncbi-gk

when script finished we have following results:

$ls /media/xlinux/genomes/galaxy/palomero/megablast/
blastdb.loc new.yeast_nt.fasta yeastdb.nhr yeastdb.nin yeastdb.nsq yeast_nt.fasta

How can I contribute with this shell script ?

Thanks in advance

--
Jacob
http://www.langebio.cinvestav.mx/bioinformatica/jacob

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.