I am trying to see if there are known repeat sequences in my chip seq data set, which are not uniquely alignable, and therefore thrown out by the eland algorithim in the sorted.txt file. I understand those sequences are present in the export.txt file. I am trying to upload that file to galaxy, but have not been able to yet. Does anyone know the file size limitation? Does anyone know the best way to compress such a file to upload it? I tried to gzip it, but for some reason the gzipped file has a .gzip.tmp filename, and I'm not sure if galaxy can handle this. Also, if anyone has any other suggestions on how to analyze the repeated portion of a chip seq, I'd be greatly appreciated.
Hi Keith, I previously could not load large SOLiD sequencer files into Galaxy, until I started putting the files on a web server and used the URL of the file. My Linux machine now has a /public_html directory for serving files. Some files are in excess of 4Gb. Maybe you could also convert your sequence file into a simpler BED format (coordinates only). Ian Quoting "Keith E. Giles" <gilesk@mail.nih.gov>:
I am trying to see if there are known repeat sequences in my chip seq data set, which are not uniquely alignable, and therefore thrown out by the eland algorithim in the sorted.txt file. I understand those sequences are present in the export.txt file. I am trying to upload that file to galaxy, but have not been able to yet. Does anyone know the file size limitation? Does anyone know the best way to compress such a file to upload it? I tried to gzip it, but for some reason the gzipped file has a .gzip.tmp filename, and I'm not sure if galaxy can handle this.
Also, if anyone has any other suggestions on how to analyze the repeated portion of a chip seq, I'd be greatly appreciated.
participants (2)
-
Ian Donaldson
-
Keith E. Giles