Hi Galaxy Dev,
Nice to meet you.
My name is Jacob and I am from México , Actually I'm working in LANGEBIO (Laboratorio Nacional de Genomica para la Biodiversidad) at Bioinformatics.
I would like to contribute with galaxy dev team, I developed shell script to set up reference genomes by megablast.
the linux shell script that I prepared uses your current prepdb.py script.
the mini pipeline flow-work:
1. read input values: * fasta filename (nucleotide) * blast format: blast format: F->nucleotide, T->protein * database output name * fasta sequence IDs format (ncbi-gbk , other)
where:
ncbi-gbk means something like ">gi|gi-number|gb|accesión|locus" other means something like ">my own IDs"
2. if fasta sequence IDs are like ncbi format(ncbi-gk) then execute prepdb.py python script and generate new fasta file then go step 4 example:
gi|gi-number|gb|accesión|locus
3. if fasta sequence IDs has diferent (other) identifiers set to galaxy format first then execute prepdb.py python script then go step 4
my own identifier
4. to format database using formatdb blast utility
5. setting up Megablast Galaxy Loc File
5. delete temp files
6. done
Usage example:
query help: ./set_galaxy_blastdb.sh -help
format blast nucleotide dabatabase yeast, set galaxy loc file ./set_galaxy_blastdb.sh -fasta /media/xlinux/genomes/galaxy/palomero/megablast/yeast_nt.fasta -p F -n yeastdb -fastadesc ncbi-gk
when script finished we have following results:
$ls /media/xlinux/genomes/galaxy/palomero/megablast/ blastdb.loc new.yeast_nt.fasta yeastdb.nhr yeastdb.nin yeastdb.nsq yeast_nt.fasta
How can I contribute with this shell script ?
Thanks in advance
-- Jacob http://www.langebio.cinvestav.mx/bioinformatica/jacob