Hi Steve, It does look like MACS is splitting on spaces and tabs. Spaces are fine for use in FASTQ headers. To solve this problem in Galaxy, in the mean time, you can use the FASTQ manipulation tool, found under the Generic FASTQ tools to translate spaces in the header to underscores. Select the FASTQ file with spaces in the header as the input, click 'Add new manipulate Reads', set 'Manipulate Reads on' to 'Name/Identifier' and 'Identifier Manipulation Type' to 'String Translate' put a space in 'From' and an underscore in 'To' and click execute. Due to the nature of this tool, it is usually recommended that the output be re-Groomed. Thanks for using Galaxy, Dan On May 4, 2010, at 8:15 AM, Stephen Taylor wrote:
Hi Daniel,
What version of MACs are you using? We have: $macs --version macs 1.3.7.1 (Oktoberfest, bug fixed #1)
Thanks for the tip. As a response to this mail we upgraded to this version.
It looks like the job completes but on closer inspection we get:
Messages from MACS:
INFO @ Tue, 04 May 2010 09:23:37: # ARGUMENTS LIST: # name = MACS_in_Galaxy # format = SAM # ChIP-seq file = /wwwdata/galaxy-dist/database/files/000/dataset_229.dat # control file = None # effective genome size = 2.70e+09 # tag size = 25 # band width = 300 # model fold = 32 # pvalue cutoff = 1.00e-05 # Ranges for calculating regional lambda are : peak_region,1000,5000,10000 INFO @ Tue, 04 May 2010 09:23:37: #1 read tag files... INFO @ Tue, 04 May 2010 09:23:37: #1 read treatment tags... Traceback (most recent call last): File "/usr/local/pkgbin/macs", line 282, in ? main() File "/usr/local/pkgbin/macs", line 66, in main (treat, control) = load_tag_files_options (options) File "/usr/local/pkgbin/macs", line 261, in load_tag_files_options treat = options.build(open2(options.tfile, gzip_flag=options.gzip_flag)) File "/package/macs/1.3.7.1/lib/python2.5/site-packages/MACS/IO/__init__.py", line 1480, in build_fwtrack (chromosome,fpos,strand) = self.__fw_parse_line(thisline) File "/package/macs/1.3.7.1/lib/python2.5/site-packages/MACS/IO/__init__.py", line 1500, in __fw_parse_line bwflag = int(thisfields[1]) ValueError: invalid literal for int(): :8:1:316:468
It turns out the SAM file has got things like 'SRR015129.6 :6:1:909:23 length=36' in the first column that MACS doesn't like. The SAM files are from FASTQs from NCBIs SRA and have been processed in galaxy using FASTQ Groomer and then BOWTIE. For example, the SAM output: SRR015129.2 :6:1:236:897 length=36 4 * 0 0 * * 0 0 GTTGAGTATAGCCTTTTGTAGAAGGATGTGATGTTG IIIIIIIIIDI.+IIIIIIEI1+I+2I%I1+I.&5$ XM:i:1 SRR015129.1 :6:1:894:108 length=36 4 * 0 0 * * 0 0 GCTGCCGATCGCACAGATAAAGAAGCCTCAATTGGC I3II1I%II1&+>1+&(7III$I%%'6I0'&*992/ XM:i:0 SRR015129.6 :6:1:909:23 length=36 4 * 0 0 * * 0 0 GCTGCTTCTCTNNTTAGAATGNNNNNNNNNNNNNNN IIII;II1I=I!!III=IAIA!!!!!!!!!!!!!!! XM:i:0 SRR015129.4 0 chr16 8180060 255 36M * 0 0 GGTGTGTTTTTATGCCTCAACCTGAGGCAAAGGTTT IIIIIIIIIIIII>IIIIIIIII;?41D<>3;+III XA:i:0 MD:Z:36 NM:i:0 SRR015129.3 0 chr13 70318444 255 36M * 0 0 GAGATTGGTAGAGAGCATGTGGTTTTCATTATAAAT IIIIIIIIIIII.I;IIIIII:IIIII/(I2II:?I XA:i:0 MD:Z:28G7 NM:i:1 SRR015129.5 0 chr3 22775604 255 36M * 0 0 GGGCATGAAGTTATTTTCAGAGAGCTTTTACTGAAG IIIIBIIIIIIIIIIIIIII:I;AFIIII:5I154+ XA:i:0 MD:Z:36 NM:i:0 SRR015129.7 16 chr17 48330835 255 36M * 0 0 TAAATTGGGTGTGTGTCACAATAAAGTGTGTGTAAC -@I0//7@6);5I8II,I*IIIIIIII<IIIIIII9 XA:i:0 MD:Z:36 NM:i:0 SRR015129.9 16 chr14 24322769 255 36M * 0 0 AGGGCAACTTCTCAACTCTCACCTTGAGGTAAATCC IDI9-IE1I::4GIII:IIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:36 NM:i:0 SRR015129.10 0 chr8 55779424 255 36M * 0 0 GGATCATCCATTGGAACCTGGTGGGATCAACAGTGG IIIIIIIIIIII@CIIIII8IH,.09I63;03*F'' XA:i:0 MD:Z:36 NM:i:0 SRR015129.8 16 chr18 78388563 255 36M * 0 0 ATTTGACCTCTTTCCTTCCCCCTCTTTCTTTTGCAC IIII=9IIIIIIDIIIIIIIIIIIIGIIIIIIIIII XA:i:0 MD:Z:36 NM:i:0 SRR015129.11 16 chr12 89953903 255 36M * 0 0 GTAAATGTATATATCCATGCGCGTACATAATCAAGC IGIII/A=I>III5?;III=I=ICIIIIIIIIIIII XA:i:0 MD:Z:36 NM:i:0 SRR015129.15 :6:1:877:106 length=36 4 * 0 0 * * 0 0 GGTTGGCTAGGTTTCCAGTACCAGGTATAATTTCCC IIIIIIIIIIIIIIIII?IIIIII4IIIIIIII=<. XM:i:1
Maybe FASTQ Groomer should remove all spaces in a header to avoid this? It's tricky to say which tool is actually to 'blame' in this case (but not Galaxy!) :-). Simple to fix on the command line but has Galaxy got a search and replace function for users that encounter such problems?
Steve
I tried a small SAM file (test-data/1.sam) on our test server and did not receive the error you listed. (No peaks were called due to the size of the file, but it seemed to recognize the format ok). Thanks, Dan On Apr 30, 2010, at 11:10 AM, Steve Taylor wrote:
On 30/04/2010 15:37, Daniel Blankenberg wrote:
Hi Steve,
The latest version in the repository and on the test server accepts bed, sam, bam, eland and elandmulti for single-end and elandmulti only for paired-end. The version currently on the main public server accepts bed, sam, and bam for single-end with paired-end not functioning. The newest version of MACs will be available on the main public server the next time it is updated.
Strange. On our local Galaxy instance I got
ERROR:root:Format "SAM" cannot be recognized!
and the bowtie (SAM) format was generated from within Galaxy.
Any ideas why?
Steve
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user