Re: [galaxy-user] wig to bigwig error

16 Apr 2012

      Hi Mike,

Yes, to get the .wig file out of the data, select the first "892177" 
lines. (Selecting "892178" would include the second track line, which 
you don't want).

After looking one more time, not all data appears to be .wig. This is a 
multiple track group file, labeled as .wig, but the second track is 
.bed, not .wig. The data didn't look right at the first pass examination 
(the second track line didn't have the "type=wiggle_0" declaration), 
which is why I thought it would be a good idea to contact the data 
authors in my original reply and not attempted to manipulate the data 
yourself (instead ask them to have it reviewed and resubmitted, or at 
least confirmed). It now is pretty clear what the merge consists of = 
.wig + .bed. If you really wanted to try to use the data as-is, I would 
start by interpreting/labeling the first track as .wig, second track as 
.bed (once split), and carefully examining the results from any research 
you perform with it.

Apologies for the complicated file analysis,

Jen
Galaxy team

---

Details about why the first track looks like a .wig file, the second 
track looks like a .bed file. NOTE: these example data have line counts 
added for clarification. When you select the data to create working 
files, use the original dataset without line counts.

All "variable Step" declaration lines are before the second track line 
at 892,178, and after that the file continues to line 925,183 in bed 
format.

- Select on "Step"
variableStep chrom=chr11 span=25	2
variableStep chrom=chr10 span=25	74501
variableStep chrom=chr13 span=25	119959
variableStep chrom=chr12 span=25	152353
variableStep chrom=chr15 span=25	185476
variableStep chrom=chr14 span=25	224351
variableStep chrom=chr17 span=25	253339
variableStep chrom=chr16 span=25	298007
variableStep chrom=chr19 span=25	325068
variableStep chrom=chr18 span=25	352583
variableStep chrom=chrM span=25	377622
variableStep chrom=chr1 span=25	378109
variableStep chrom=chr3 span=25	431654
variableStep chrom=chr2 span=25	468728
variableStep chrom=chr5 span=25	538115
variableStep chrom=chr4 span=25	600376
variableStep chrom=chr7 span=25	663953
variableStep chrom=chr6 span=25	726093
variableStep chrom=chr9 span=25	770436
variableStep chrom=chrX span=25	819431
variableStep chrom=chr8 span=25	830175

- Select first lines from dataset=10
- http://genome.ucsc.edu/goldenPath/help/wiggle.html
track type=wiggle_0 visibility=full name="Smc3_mES" autoScale=on 
color=100,0,100	1
variableStep chrom=chr11 span=25	2
3000251	0.6	3
3000276	1.5	4
3000301	1.6	5
3000326	1.7	6
3000351	1.7	7
3000376	1.7	8
3000401	1.7	9
3000426	1.6	10

- Select last lines from a dataset= 33006 (calculated from 925183-892178+1)
- http://genome.ucsc.edu/FAQ/FAQformat.html#format1
track visibility=dense name="Smc3_mES enriched regions - 1e-09" 
color=100,0,100	892178
chr11	3023275	3023700	892179
chr11	3028200	3028225	892180
chr11	3039225	3039275	892181
chr11	3040500	3040525	892182
chr11	3070325	3070375	892183
chr11	3080650	3080675	892184
chr11	3085850	3085950	892185
chr11	3097450	3097475	892186
chr11	3190200	3190275	892187
(...more until end of file...)

On 4/16/12 9:02 AM, Michael Sikes wrote:
...
Jen,
A couple of uninformed questions. I gather from your response that the
author lab submitted a multiple track group .wig file instead of a
single track group .wig file, and that I need to generate a single track
group file before the bigwig conversion will work. So, with regard to
the instructions below, I am to run the text manipulation on the
original author submitted .wig file. Then run "filter and sort--Select
lines that match an expression" on the newly created file that:
"Matching" the pattern: "track". This generates yet another file that
has the following info:
88: Select on data 87 <https://main.g2.bx.psu.edu/history>
1 line, 1 comments
format: wig, database: mm8
Info: Matching pattern: track
<https://main.g2.bx.psu.edu/datasets/8997307e4b7b843c/display?to_ext=wig><https://main.g2.bx.psu.edu/datasets/8997307e4b7b843c/show_params><https://main.g2.bx.psu.edu/tool_runner/rerun?id=7048469><https://main.g2.bx.psu.edu/history>
<https://main.g2.bx.psu.edu/tag/retag?item_id=8997307e4b7b843c&item_class=HistoryDatasetAssociation><https://main.g2.bx.psu.edu/dataset/annotate?id=8997307e4b7b843c>
track type=wiggle_0 visibility=full name="Smc3_mES" autoScale=on color=100,0,100	1
track visibility=dense name="Smc3_mES enriched regions - 1e-09"  color=100,0,100	892178
Is the number 892173 the number of track lines? If so, do I then do the
"Remove beginning of a file" using 892178 on the original author .wig file?
Mike
...
On Apr 16, 2012, at 10:35 AM, Jennifer Jackson wrote:
...
Hi Mike,
I apologize if I wasn't clear, but the 'Select' was to show you how
to identify the multi-track group wig files. I wanted to give you a
way to screen similar files going forward.
The wig-to-bigWig program in Galaxy comes from UCSC. It accepts .wig
files with a single track group as input:
http://genome.ucsc.edu/goldenPath/help/bigWig.html (see step #1)
The data author lab can either submit the data as single track group
.wig files, or, if you are confident that the multiple track group
.wig format is expected and OK from this source, split the file.
There are no specific tools in Galaxy to do this, but something like
this would work:
- Text Manipulation -> "Add column", "1", Iterate? = yes
- "Select", "track"
- note the line number of track lines
- "Remove beginning of a file", using line numbers, and the
-original- .wig file, to break up into individual .wig files.
Good luck!
Jen
Galaxy team
On 4/16/12 6:57 AM, Michael Sikes wrote:
...
Jennifer,
Thanks for your help. I ran the filter and sort tool as advised, and
then ran the wig to bigwig on the new history item generated by the
filter. This time I got a different error:
84: Wig-to-bigWig on data 83 <https://main.g2.bx.psu.edu/history>
0 bytes
An error occurred running this job:/stdin is empty of data
Error running wigToBigWig.
/
<https://main.g2.bx.psu.edu/dataset/errors?id=6818347><https://main.g2.bx.psu.edu/datasets/0f70746579b165e2/show_params><https://main.g2.bx.psu.edu/tool_runner/rerun?id=6818347>
<https://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/display/?preview=True><https://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/edit><https://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/delete?show_deleted_on_refresh=False>
83: Select on data 49 <https://main.g2.bx.psu.edu/history>
1 line, 1 comments
format: wig, database: mm8
Info: Matching pattern: track
<https://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/display?to_ext=wig><https://main.g2.bx.psu.edu/datasets/b4fb2e8c767b4258/show_params><https://main.g2.bx.psu.edu/tool_runner/rerun?id=6818275><https://main.g2.bx.psu.edu/history>
<https://main.g2.bx.psu.edu/tag/retag?item_id=b4fb2e8c767b4258&item_class=HistoryDatasetAssociation
<https://main.g2.bx.psu.edu/tag/retag?item_id=b4fb2e8c767b4258&item_class=HistoryDatasetAssociation>><https://main.g2.bx.psu.edu/dataset/annotate?id=b4fb2e8c767b4258>
Again, I'm sure I left off something obvious. Could you tell me what I
did wrong?
Thanks,
Mike
On Apr 13, 2012, at 1:27 PM, Jennifer Jackson wrote:
...
Hi Michael,
This particular .wig file has a data format problem that is the root
cause of the conversion error. Specifically, there is an extra track
line in the file. This can be found using unix tools with a grep or in
Galaxy with the tool "Filter and Sort -> Select" by matching the
pattern "track".
Ideally this would be corrected and resubmitted by the data author
before use, since how/why this was inserted and what impact it has
would need to be examined.
Since you noticed problems with other GEO files (conversion problems),
verifying the .wig format and making any necessary corrections would
also be advised.
Hopefully this helps!
Best,
Jen
Galaxy team
Michael Sikes, Ph.D.
Associate Professor of Immunology
North Carolina State University
Microbiology Department
4524A Gardner Hall
Campus Box 7615
Raleigh, NC 27695
Ph: 919-513-0528
Fax: 919-515-7867
email: mlsikes@ncsu.edu <mailto:mlsikes@ncsu.edu>
-- 
Jennifer Jackson
http://galaxyproject.org