Lost sequence info on update
Hi, I've recently done an svn update on my local Galaxy install, which all seemed to go well. However, in the history pane, the information for fasta and solexafastq files no longer details the number of sequences in the file, it only gives the size of the file. How do I get back the sequence information for those file types? Thanks, Chris
Hello Chris, The more recent version of the Galaxy code to which you've upgraded has changes to the set_peek() methods of the data type classes that use less memory. Although the previous version of the code provided the number of sequences in the files, doing so was memory intensive for large files. To revert to this behavior in your local instance, you'll need to revert the set_peek() methods in the Fasta and FastaSolexa classes in ~/lib/galaxy/datatypes/sequence.py to be: class Fasta( Sequence ): """Class representing a FASTA sequence""" file_ext = "fasta" def set_peek( self, dataset ): dataset.peek = data.get_file_peek( dataset.file_name ) count = size = 0 for line in file( dataset.file_name ): if line and line[0] == ">": count += 1 else: line = line.strip() size += len(line) if count == 1: dataset.blurb = '%d bases' % size else: dataset.blurb = '%d sequences' % count class FastqSolexa( Sequence ): """Class representing a FASTQ sequence ( the Solexa variant )""" file_ext = "fastqsolexa" def set_peek( self, dataset ): dataset.peek = data.get_file_peek( dataset.file_name ) count = size = 0 bases_regexp = re.compile("^[NGTAC]*$") for i, line in enumerate(file( dataset.file_name )): if line and line[0] == "@" and i % 4 == 0: count += 1 elif bases_regexp.match(line): line = line.strip() size += len(line) if count == 1: dataset.blurb = '%d bases' % size else: dataset.blurb = '%d sequences' % count Greg Von Kuster Galaxy Development Team Chris Cole wrote:
Hi,
I've recently done an svn update on my local Galaxy install, which all seemed to go well. However, in the history pane, the information for fasta and solexafastq files no longer details the number of sequences in the file, it only gives the size of the file. How do I get back the sequence information for those file types? Thanks,
Chris _______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
participants (2)
-
Chris Cole
-
Greg Von Kuster