Hi Mike,
Yes, to get the .wig file out of the data, select the first "892177" lines. (Selecting "892178" would include the second track line, which you don't want).
After looking one more time, not all data appears to be .wig. This is a multiple track group file, labeled as .wig, but the second track is .bed, not .wig. The data didn't look right at the first pass examination (the second track line didn't have the "type=wiggle_0" declaration), which is why I thought it would be a good idea to contact the data authors in my original reply and not attempted to manipulate the data yourself (instead ask them to have it reviewed and resubmitted, or at least confirmed). It now is pretty clear what the merge consists of = .wig + .bed. If you really wanted to try to use the data as-is, I would start by interpreting/labeling the first track as .wig, second track as .bed (once split), and carefully examining the results from any research you perform with it.
Apologies for the complicated file analysis,
Jen Galaxy team
---
Details about why the first track looks like a .wig file, the second track looks like a .bed file. NOTE: these example data have line counts added for clarification. When you select the data to create working files, use the original dataset without line counts.
All "variable Step" declaration lines are before the second track line at 892,178, and after that the file continues to line 925,183 in bed format.
- Select on "Step" variableStep chrom=chr11 span=25 2 variableStep chrom=chr10 span=25 74501 variableStep chrom=chr13 span=25 119959 variableStep chrom=chr12 span=25 152353 variableStep chrom=chr15 span=25 185476 variableStep chrom=chr14 span=25 224351 variableStep chrom=chr17 span=25 253339 variableStep chrom=chr16 span=25 298007 variableStep chrom=chr19 span=25 325068 variableStep chrom=chr18 span=25 352583 variableStep chrom=chrM span=25 377622 variableStep chrom=chr1 span=25 378109 variableStep chrom=chr3 span=25 431654 variableStep chrom=chr2 span=25 468728 variableStep chrom=chr5 span=25 538115 variableStep chrom=chr4 span=25 600376 variableStep chrom=chr7 span=25 663953 variableStep chrom=chr6 span=25 726093 variableStep chrom=chr9 span=25 770436 variableStep chrom=chrX span=25 819431 variableStep chrom=chr8 span=25 830175
- Select first lines from dataset=10 - http://genome.ucsc.edu/goldenPath/help/wiggle.html track type=wiggle_0 visibility=full name="Smc3_mES" autoScale=on color=100,0,100 1 variableStep chrom=chr11 span=25 2 3000251 0.6 3 3000276 1.5 4 3000301 1.6 5 3000326 1.7 6 3000351 1.7 7 3000376 1.7 8 3000401 1.7 9 3000426 1.6 10
- Select last lines from a dataset= 33006 (calculated from 925183-892178+1) - http://genome.ucsc.edu/FAQ/FAQformat.html#format1 track visibility=dense name="Smc3_mES enriched regions - 1e-09" color=100,0,100 892178 chr11 3023275 3023700 892179 chr11 3028200 3028225 892180 chr11 3039225 3039275 892181 chr11 3040500 3040525 892182 chr11 3070325 3070375 892183 chr11 3080650 3080675 892184 chr11 3085850 3085950 892185 chr11 3097450 3097475 892186 chr11 3190200 3190275 892187 (...more until end of file...)
On 4/16/12 9:02 AM, Michael Sikes wrote:
Jen,
A couple of uninformed questions. I gather from your response that the author lab submitted a multiple track group .wig file instead of a single track group .wig file, and that I need to generate a single track group file before the bigwig conversion will work. So, with regard to the instructions below, I am to run the text manipulation on the original author submitted .wig file. Then run "filter and sort--Select lines that match an expression" on the newly created file that: "Matching" the pattern: "track". This generates yet another file that has the following info:
88: Select on data 87 https://main.g2.bx.psu.edu/history 1 line, 1 comments format: wig, database: mm8 Info: Matching pattern: track https://main.g2.bx.psu.edu/datasets/8997307e4b7b843c/display?to_ext=wighttps://main.g2.bx.psu.edu/datasets/8997307e4b7b843c/show_paramshttps://main.g2.bx.psu.edu/tool_runner/rerun?id=7048469https://main.g2.bx.psu.edu/history https://main.g2.bx.psu.edu/tag/retag?item_id=8997307e4b7b843c&item_class=HistoryDatasetAssociationhttps://main.g2.bx.psu.edu/dataset/annotate?id=8997307e4b7b843c
track type=wiggle_0 visibility=full name="Smc3_mES" autoScale=on color=100,0,100 1 track visibility=dense name="Smc3_mES enriched regions - 1e-09" color=100,0,100 892178
Is the number 892173 the number of track lines? If so, do I then do the "Remove beginning of a file" using 892178 on the original author .wig file? Mike
On Apr 16, 2012, at 10:35 AM, Jennifer Jackson wrote:
Hi Mike,
I apologize if I wasn't clear, but the 'Select' was to show you how to identify the multi-track group wig files. I wanted to give you a way to screen similar files going forward.
The wig-to-bigWig program in Galaxy comes from UCSC. It accepts .wig files with a single track group as input: http://genome.ucsc.edu/goldenPath/help/bigWig.html (see step #1)
The data author lab can either submit the data as single track group .wig files, or, if you are confident that the multiple track group .wig format is expected and OK from this source, split the file. There are no specific tools in Galaxy to do this, but something like this would work:
- Text Manipulation -> "Add column", "1", Iterate? = yes
- "Select", "track"
- note the line number of track lines
- "Remove beginning of a file", using line numbers, and the
-original- .wig file, to break up into individual .wig files.
Good luck!
Jen Galaxy team
On 4/16/12 6:57 AM, Michael Sikes wrote:
Jennifer,
Thanks for your help. I ran the filter and sort tool as advised, and then ran the wig to bigwig on the new history item generated by the filter. This time I got a different error: 84: Wig-to-bigWig on data 83 https://main.g2.bx.psu.edu/history 0 bytes An error occurred running this job:/stdin is empty of data Error running wigToBigWig. / https://main.g2.bx.psu.edu/dataset/errors?id=6818347https://main.g2.bx.psu.edu/datasets/0f70746579b165e2/show_paramshttps://main.g2.bx.psu.edu/tool_runner/rerun?id=6818347 https://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/display/?preview=Truehttps://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/edithttps://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/delete?show_deleted_on_refresh=False 83: Select on data 49 https://main.g2.bx.psu.edu/history 1 line, 1 comments format: wig, database: mm8 Info: Matching pattern: track https://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/display?to_ext=wighttps://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/show_paramshttps://main.g2.bx.psu.edu/tool_runner/rerun?id=6818275https://main.g2.bx.psu.edu/history <https://main.g2.bx.psu.edu/tag/retag?item_id=b4fb2e8c767b4258&item_class... https://main.g2.bx.psu.edu/tag/retag?item_id=b4fb2e8c767b4258&item_class=HistoryDatasetAssociation>https://main.g2.bx.psu.edu/dataset/annotate?id=b4fb2e8c767b4258
Again, I'm sure I left off something obvious. Could you tell me what I did wrong?
Thanks, Mike
On Apr 13, 2012, at 1:27 PM, Jennifer Jackson wrote:
Hi Michael,
This particular .wig file has a data format problem that is the root cause of the conversion error. Specifically, there is an extra track line in the file. This can be found using unix tools with a grep or in Galaxy with the tool "Filter and Sort -> Select" by matching the pattern "track".
Ideally this would be corrected and resubmitted by the data author before use, since how/why this was inserted and what impact it has would need to be examined.
Since you noticed problems with other GEO files (conversion problems), verifying the .wig format and making any necessary corrections would also be advised.
Hopefully this helps!
Best,
Jen Galaxy team
Michael Sikes, Ph.D. Associate Professor of Immunology North Carolina State University Microbiology Department 4524A Gardner Hall Campus Box 7615 Raleigh, NC 27695 Ph: 919-513-0528 Fax: 919-515-7867 email: mlsikes@ncsu.edu mailto:mlsikes@ncsu.edu