Hi Galaxy Dev, Nice to meet you. My name is Jacob and I am from México , Actually I'm working in LANGEBIO (Laboratorio Nacional de Genomica para la Biodiversidad) at Bioinformatics. I would like to contribute with galaxy dev team, I developed shell script to set up reference genomes by megablast. the linux shell script that I prepared uses your current prepdb.py script. the mini pipeline flow-work: 1. read input values: * fasta filename (nucleotide) * blast format: blast format: F->nucleotide, T->protein * database output name * fasta sequence IDs format (ncbi-gbk , other) where: ncbi-gbk means something like ">gi|gi-number|gb|accesión|locus" other means something like ">my own IDs" 2. if fasta sequence IDs are like ncbi format(ncbi-gk) then execute prepdb.py python script and generate new fasta file then go step 4 example:
gi|gi-number|gb|accesión|locus
3. if fasta sequence IDs has diferent (other) identifiers set to galaxy format first then execute prepdb.py python script then go step 4
my own identifier
4. to format database using formatdb blast utility 5. setting up Megablast Galaxy Loc File 5. delete temp files 6. done Usage example: query help: ./set_galaxy_blastdb.sh -help format blast nucleotide dabatabase yeast, set galaxy loc file ./set_galaxy_blastdb.sh -fasta /media/xlinux/genomes/galaxy/palomero/megablast/yeast_nt.fasta -p F -n yeastdb -fastadesc ncbi-gk when script finished we have following results: $ls /media/xlinux/genomes/galaxy/palomero/megablast/ blastdb.loc new.yeast_nt.fasta yeastdb.nhr yeastdb.nin yeastdb.nsq yeast_nt.fasta How can I contribute with this shell script ? Thanks in advance -- Jacob http://www.langebio.cinvestav.mx/bioinformatica/jacob -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.