When to use metadata or dataset.blurb?
So I wanted to add in the number of sequences in a file, which is something that is often requested. My initial thought was to add a tool, which I could do, but then I thought that this was metadata that should probably be added to any record based class. So I then thought that I would find the code that set up the file size information in the history column and copy that, but looking at the source, that goes in the blurb (e.g. sequence.Fasta). So here are my questions: * how can a user see the metadata, as this is distinct to the blurb/peek? * why isn't the file size implemented as a piece of metadata? thanks, James
Hello James, see my comments inline. Greg Von Kuster Galaxy Development Team James Casbon wrote:
So I wanted to add in the number of sequences in a file, which is something that is often requested.
We used to provide this information in the peek for some sequence formats, here is how we did it for sequence.Fasta: def set_peek( self, dataset ): dataset.peek = data.get_file_peek( dataset.file_name ) count = size = 0 for line in file( dataset.file_name ): if line and line[0] == ">": count += 1 else: line = line.strip() size += len(line) if count == 1: dataset.blurb = '%d bases' % size else: dataset.blurb = '%d sequences' % count The problem, obviously, is that the entire files has to be read - intensive for large files. Recently, we have implemented the ability to set metadata on cluster nodes, although this is still in test phase, so perhaps we can look at re-introducing features like this in the future. My initial thought was to add a
tool, which I could do, but then I thought that this was metadata that should probably be added to any record based class. So I then thought that I would find the code that set up the file size information in the history column and copy that, but looking at the source, that goes in the blurb (e.g. sequence.Fasta).
So here are my questions: * how can a user see the metadata, as this is distinct to the blurb/peek?
I'm not sure I understand this question, but you can implement your set_peek() method for your data type to place any information you want in the blurb.
* why isn't the file size implemented as a piece of metadata?
This is historical - for quite some time we were not even generating files sizes or doing anything with them, and then about a year ago or so we began using this information for some features. The metadata for a data type is stored in the db as a jsonified string ( originally is was pickled ), and the files size is stored in the db in a separate column. This may not be the best approach, but metadata and file_size were introduced into the system at different times, and file_size was introduced when metadata was pickled, so it was a bit trickier to add the additional information. We can probably look at optimizing some of this in future releases.
thanks, James _______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
participants (2)
-
Greg Von Kuster
-
James Casbon