Greetings, I was adding datatype classes to galaxy.datatype.text.py for our custom JSON formats and I noticed what I think is a problem with the Json datatype class, in particular the _looks_like_json method. The basic algorithm is (in pseudo code): if the_file_is_small_enough: try to load the the file as json return True on success, False otherwise else while True find first non-blank line if line starts with ‘{‘ or ‘[‘: return True else return False However, JSON is typically compressed by stripping whitespace (including newlines), particularly when it is fetched from a web service. This means that the first call to readLine is going to load the entire file, which defeats the purpose of checking the file size in the first place. I would submit a pull request with our fix, but since I am not a Python programmer I thought I would simply post the code here for review. class Json(Text): # Unchanged bits of the class have been omitted. # Add a function that reads a single character skipping over whitespace. def read(self, fileHandle): ch = fileHandle.read(1) while ch.isspace(): ch = fileHandle.read(1) return ch # Only the “else-part” has changed def _looks_like_json(self, filename): # Pattern used by SequenceSplitLocations if os.path.getsize(filename) < 50000: # If the file is small enough - don't guess just check. try: json.load(open(filename, "r")) return True except Exception: return False else: with open(filename, "r") as fh: ch = self.read(fh) return ch == '{' or ch == '[' We then use the read() method to sniff out our JSON formats. Once we can sniff a header subclasses only need to specify the header. class Lapps( Json ): header = ‘’’{“discriminator”:”http://vocab.lappsgrid.org''' def sniff(self, filename) with open(filename) as fh: for c in header: if c != self.read(fh) return False return True class Lif ( Lapps ): header = ‘’’{“discriminator”:”http://vocab.lappsgrid.org/ns/media/jsonld#lif”''' class Gate( Lapps ): header = ‘’’{“discriminator”:”http://vocab.lappsgrid.org/ns/media/gate”''' … Regardless of how the JSON is rendered (pretty printed or not) we can match the “magic” strings used to identify our formats. Being new to Python I don’t know how expensive it is to read a file one character at a time, but it has to be cheaper than potentially reading the entire file. Cheers, Keith ------------------------------ Research Associate Department of Computer Science Vassar College Poughkeepsie, NY