Hi Luce, I'm forwarding this question to the Galaxy-User mailing list, as I think this is a pretty common situation. Here's how I replace text in a column. It's a two step process for each dataset. First go to Text Manipulation -> Compute. In the Add expression text box enter columnNum.replace("oldVal", "newVal") In your case I think this is c4.replace("MACS_peak_", "treatment1_peak_", 1) "replace" is a Python character string operation, and c4 is the character string column we are working on. I added the 1 out of paranoia. This tells galaxy to only replace the first occurrence of the old string, in each line. Care must be taken to avoid more replacement than you want. Executing this will create a dataset with a new column at the end. Now, use the Text Manipulation -> Cut operation to substitute the new column in place of the old column. Does that do the trick? Thanks, Dave C. On Thu, Dec 8, 2011 at 4:24 PM, las2017 <las2017@med.cornell.edu> wrote:
I have two ChIPSeq datasets, and I am trying to find the common and distinct peaks between them and visualize them. I end up with a MACS bed file for each (listing a bunch of MACS_peaks). I then use the Intersect and Subtract tools from the Genomic Intervals tab and end up with the peaks I want. However, because of the way that MACS names its peaks, there can end up being some peaks named the same way in both files (because, for example, peak 20 in file1 is from position 300,000-300,500 but peak 20 in file 2 is from position 320,000-320,500). So, I can end up with multiple peaks with the same name. Because all the peak names have the same form, it can also be difficult to tell them apart when visualizing them in the UCSC Genome Browser.
What I would like to do is to be able to edit the bed file to change the text MACS_peak_<number> to, say, treatment1_peak_<number> so that peak 20 would now still be numbered 20 in both files, but would have a different label. This would be pretty easy to do using regular expressions and sed.
I know there have been a few posts about text manipulation, and I know that there is a text manipulation tab, but I can't seem to find an easy way to do what I want to do.
Any advice?
Thanks, luce
-- http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ http://galaxyproject.org/wiki/
Thanks! This worked perfectly. luce On 9 Dec, 2011, at 2:31 PM, Dave Clements wrote:
Hi Luce,
I'm forwarding this question to the Galaxy-User mailing list, as I think this is a pretty common situation.
Here's how I replace text in a column. It's a two step process for each dataset.
First go to Text Manipulation -> Compute.
In the Add expression text box enter
columnNum.replace("oldVal", "newVal")
In your case I think this is
c4.replace("MACS_peak_", "treatment1_peak_", 1)
"replace" is a Python character string operation, and c4 is the character string column we are working on. I added the 1 out of paranoia. This tells galaxy to only replace the first occurrence of the old string, in each line. Care must be taken to avoid more replacement than you want.
Executing this will create a dataset with a new column at the end.
Now, use the Text Manipulation -> Cut operation to substitute the new column in place of the old column.
Does that do the trick?
Thanks,
Dave C.
On Thu, Dec 8, 2011 at 4:24 PM, las2017 <las2017@med.cornell.edu> wrote: I have two ChIPSeq datasets, and I am trying to find the common and distinct peaks between them and visualize them. I end up with a MACS bed file for each (listing a bunch of MACS_peaks). I then use the Intersect and Subtract tools from the Genomic Intervals tab and end up with the peaks I want. However, because of the way that MACS names its peaks, there can end up being some peaks named the same way in both files (because, for example, peak 20 in file1 is from position 300,000-300,500 but peak 20 in file 2 is from position 320,000-320,500). So, I can end up with multiple peaks with the same name. Because all the peak names have the same form, it can also be difficult to tell them apart when visualizing them in the UCSC Genome Browser.
What I would like to do is to be able to edit the bed file to change the text MACS_peak_<number> to, say, treatment1_peak_<number> so that peak 20 would now still be numbered 20 in both files, but would have a different label. This would be pretty easy to do using regular expressions and sed.
I know there have been a few posts about text manipulation, and I know that there is a text manipulation tab, but I can't seem to find an easy way to do what I want to do.
Any advice?
Thanks, luce
-- http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ http://galaxyproject.org/wiki/
participants (2)
-
Dave Clements
-
Lucy A. Skrabanek