Curtis,
[curtish@cheaha galaxy]$ find . -name "*.py" | xargs grep sam_fa_indices.loc ./tools/samtools/sam_pileup.py: seqFile = '%s/sam_fa_indices.loc' % GALAXY_DATA_INDEX_DIR ... ./tools/ngs_rna/cufflinks_wrapper_without_gtf.py: cached_seqs_pointer_file = os.path.join( options.index_dir, 'sam_fa_indices.loc' )
Is there any place in galaxy-core where such a core service lives and could be used by all these adaptors, rather than replicating the code everywhere?
Not yet, but this is definitely needed. However, tools and Galaxy must remain independent , so the location of needed indices should be passed to the tool via the command line rather than having tools call into Galaxy.
As a related question, for fasta genomes from the current history, these wrappers compute the .fai file on the fly, in TMP, then throw it away, every time. Has there been any discussion about storing such derived indices in the dataset’s metadata (like the .bai file on a .bam data set), so it gets computed once, then re-used?
Converted datasets, which subsume indices-as-metadata, can store dataset indices. Extending converted datasets to store indices created on the fly is also very much needed. Any community contributions that address these issues would be most welcome. Best, J.