I've updated the unix-tools package.
1. Created a combined "Find and Replace" tool, which works on both lines
or columns, and allows simple string or regular expressions find&replace.
The purpose of this tool is to give users a way to replace text in
tabular text files. Without it, users needs to save the file, and
perform the replacements in excel. Being a Perl script, it works on
large files with millions of lines (on which excel chokes).
A usage example would be:
Find all words in column 2 which starts with a digit, and add a "chr"
prefix, effectively converting those drosophila "4L" chromosomes into
see screen shot at:
2. Select lines by Word-List tool.
This tool accepts two files: one which will be filtered,
the other contains a list of words to match.
If a line from the first file matches one of the words from the other
file - it is printed to the output dataset.
This tool allows similar functionality as the "advanced filter" option
While it is possible to achieve same functionally by building a regular
expression and using Galaxy's native "select" tool - using this tool is
easier and more intuitive (IMHO).
Further more, this tool can be used as part of a workflow.
A usage workflow example:
Get the DM3 repeat masker track from UCSC,
Group by CLASS + count,
Sort descending by count,
Select first 10 lines,
Cut first column (this is the word list to filter by).
Then use this tool to filter the repeat masker file with the words in
the word list.
full information for the top ten classes from the repeat masker track.
see screen shot at:
Comments are welcomed,
I've developed a few tools for Galaxy and I think I ran into a bug that
even exists in the latest version. As you know a Galaxy server
maintains the same external dataset ID (i.e. viewable in the web URLs)
to the filesystem internal dataset ID (i.e. names in
database/files/000/) if no user in the Galaxy server has yet shared any
histories (and their datasets). But once sharing starts the external
dataset IDs start differing from the internal dataset ID, and they are
always higher and Galaxy maintains this transparently. But this
behavior seems to be broken with the output files_path property.
If you have a tool which uses the output files_path property like this
one I have:
<command interpreter="perl">search.pl $query_list $output1
On my test server I've shared one history with a single dataset. So my
external-internal offset is 1. The above tool then produces the
Galaxy is not generating the correct files_path, it should end in
dataset_31_files not 32. This bug causes the tool to completely break
when you are trying to view your output in the browser and I tried to
circumvent the bug with symlinking magic but it can't fix the problem
because symlinks start stepping on existing directories once you run the
tool more than once.
Thanks for any help on how to fix the problem,
I would like to share a little tool for galaxy that I wrote.
The tool takes fasta files as input, keeps the header of each sequence but
shuffles the sequence itself in a random order. Therefore the output file is
a fasta file with sequences of the same base composition and length, but
It is a very simple idea but gives me the opportunity to statistically
evaluate certain analysis for ³real² libraries with the newly created ³in
I hope some of you guys find this tool useful, comments are more than