Hi Galaxy Dev,
Nice to meet you.
My name is Jacob and I am from México , Actually I'm working in
LANGEBIO (Laboratorio Nacional de Genomica para la Biodiversidad) at
Bioinformatics.
I would like to contribute with galaxy dev team, I developed shell
script to set up reference genomes by megablast.
the linux shell script that I prepared uses your current prepdb.py
script.
the mini pipeline flow-work:
1. read input values:
* fasta filename (nucleotide)
* blast format: blast format: F->nucleotide, T->protein
* database output name
* fasta sequence IDs format (ncbi-gbk , other)
where:
ncbi-gbk means something like ">gi|gi-number|gb|accesión|locus"
other means something like ">my own IDs"
2. if fasta sequence IDs are like ncbi format(ncbi-gk) then execute
prepdb.py python script and generate new fasta file then go step 4
example:
>gi|gi-number|gb|accesión|locus
3. if fasta sequence IDs has diferent (other) identifiers set to
galaxy format first then execute prepdb.py python script then go
step 4
>my own identifier
4. to format database using formatdb blast utility
5. setting up Megablast Galaxy Loc File
5. delete temp files
6. done
Usage example:
query help:
./set_galaxy_blastdb.sh -help
format blast nucleotide dabatabase yeast, set galaxy loc file
./set_galaxy_blastdb.sh -fasta
/media/xlinux/genomes/galaxy/palomero/megablast/yeast_nt.fasta -p F
-n yeastdb -fastadesc ncbi-gk
when script finished we have following results:
$ls /media/xlinux/genomes/galaxy/palomero/megablast/
blastdb.loc new.yeast_nt.fasta yeastdb.nhr yeastdb.nin
yeastdb.nsq yeast_nt.fasta
How can I contribute with this shell script ?
Thanks in advance
--
Jacob
http://www.langebio.cinvestav.mx/bioinformatica/jacob
--
This message has been scanned for viruses and
dangerous content by
MailScanner, and is
believed to be clean.