Hi all,
 
I recently tried uploading an couple of xml files to my local Galaxy installation using the standard ‘Upload File (version 1.1.3)’ tool. For some files this produced the error: The uploaded file contains inappropriate HTML content.
 
Given the files had been created by the same automated code and contained the same tags etc, I couldn't understand why one would produce this error and the other not.
 
Finally tracking down the function check_html() in galaxy-dist/lib/galaxy/datatypes/checkers.py, I discovered that my use of the tag <metabolite> had flagged up as a likely <META > tag and produced the warning/failed upload.
 
The reason this did not happen in every case is that check_html only reads the first 100 lines of the file and depending upon how many samples were in my dataset, my <metabolite> tag could appear before or after this cutoff.
 
I've solved the problem by changing my xml tag names but my question is:
 
a) why does check_html only read up to line 100?
b) would it be possible to change the regular expressions in check_html so that e.g. <meta ...> would be found but e.g. <metaxxx ...> would not?
 
Thanks for reading.
 
Rob
 
Dr Robert L Davidson
NERC Metabolomics Facility
School of Biosciences
University of Birmingham
Edgbaston, UK