extract_genomic_dna.py checks alignseq.loc?
Hi all, I recently upgraded our local instance of galaxy to the latest revision of galaxy-dist. Now I tried to use "Extract Genomic DNA" with a 3 column bed file, which only contains chr, start and end, with database/build set to hg19. First of all, running the module results in the following error: --->%--- Traceback (most recent call last): File "/local/data/home/galaxy/galaxy-dist-2011-11-23/tools/extract/extract_genomic_dna.py", line 283, in <module> if __name__ == "__main__": __main__() File "/local/data/home/galaxy/galaxy-dist-2011-11-23/tools/extract/extract_genomic_dna.py", line 107, in __main__ seq_path = check_seq_file( dbkey, GALAXY_DATA_INDEX_DIR ) File "/local/data/home/galaxy/galaxy-dist-2011-11-23/tools/extract/extract_genomic_dna.py", line 40, in check_seq_file for line in open( seq_file ): NameError: global name 'seq_file' is not defined ---%<--- I went to the respective file and found that the line which defines seq_file is commented out: (l.38 in def check_seq_file()) ## seq_file = "%s/alignseq.loc" % GALAXY_DATA_INDEX_DIR This seems to be a bug in the current version of the file. Removing the comment, the script tries to check for sequence entries in alignseq.loc, which I left empty before, since I didn't need aligned sequences in galaxy until now. Of course this results in another error: 'No sequences are available for 'hg19', request them by reporting this error.' I just wanted to raise the question if this dependency is right, wouldn't one rather like to check for the respective build in faseq.loc (unfortunately the file format is different, it doesn't contain the seq in the first column). Is there a fix for this somewhere already, or did I misunderstand how this is supposed to work? Cheers, Holger -- Dr. Holger Klein Core Facility Bioinformatics Institute of Molecular Biology gGmbH (IMB) http://www.imb-mainz.de/ Tel: +49(6131) 39 21511
Hi Holger,
I went to the respective file and found that the line which defines seq_file is commented out: (l.38 in def check_seq_file()) ## seq_file = "%s/alignseq.loc" % GALAXY_DATA_INDEX_DIR
This seems to be a bug in the current version of the file.
There is clearly something amiss. In the galaxy-dist source, this line is not commented: https://bitbucket.org/galaxy/galaxy-dist/src/b258de1e6cea/tools/extract/extr...
Removing the comment, the script tries to check for sequence entries in alignseq.loc, which I left empty before, since I didn't need aligned sequences in galaxy until now. Of course this results in another error:
'No sequences are available for 'hg19', request them by reporting this error.'
This is the correct behavior.
I just wanted to raise the question if this dependency is right, wouldn't one rather like to check for the respective build in faseq.loc (unfortunately the file format is different, it doesn't contain the seq in the first column).
Yes, faseq.loc should be used. The use of alignseq.loc is a historical artifact that we haven't fixed yet. If you're inclined to fix it, we'd be happy to incorporate the changes into the code base. Best, J.
participants (2)
-
Holger Klein
-
Jeremy Goecks