Folks,

 

I was writing a samtools/mpileup wrapper for our local use. When I delved into how the existing samtools/sam_pileup.py adaptor worked, if found that it has a local copy of the routine to look up the samtools .fai file in sam_fa_indices.loc for “installed” genomes. I then noticed that this routine is duplicated in many different adaptors:

 

[curtish@cheaha galaxy]$ find . -name "*.py"  | xargs grep sam_fa_indices.loc

./tools/samtools/sam_pileup.py:    seqFile = '%s/sam_fa_indices.loc' % GALAXY_DATA_INDEX_DIR

./tools/samtools/sam_mpileup_view.py:    seqFile = '%s/sam_fa_indices.loc' % GALAXY_DATA_INDEX_DIR

./tools/samtools/sam_to_bam.py:    cached_seqs_pointer_file = '%s/sam_fa_indices.loc' % options.index_dir

./tools/ngs_rna/cufflinks_wrapper_with_gtf.py:        cached_seqs_pointer_file = os.path.join( options.index_dir, 'sam_fa_indices.loc' )

./tools/ngs_rna/cuffdiff_wrapper.py:        cached_seqs_pointer_file = os.path.join( options.index_dir, 'sam_fa_indices.loc' )

./tools/ngs_rna/cufflinks_wrapper.py:        cached_seqs_pointer_file = os.path.join( options.index_dir, 'sam_fa_indices.loc' )

./tools/ngs_rna/cuffcompare_wrapper.py:        cached_seqs_pointer_file = os.path.join( options.index_dir, 'sam_fa_indices.loc' )

./tools/ngs_rna/cufflinks_wrapper_without_gtf.py:        cached_seqs_pointer_file = os.path.join( options.index_dir, 'sam_fa_indices.loc' )

 

Is there any place in galaxy-core where such a core service lives and could be used by all these adaptors, rather than replicating the code everywhere?

 

As a related question, for fasta genomes from the current history, these wrappers compute the .fai file on the fly, in TMP, then throw it away, every time. Has there been any discussion about storing such derived indices in the dataset’s metadata (like the .bai file on a .bam data set), so it gets computed once, then re-used?