Lastz very slow
by Andrew South
Hi - is there a reason Lastz is very slow right now? I am mapping 2-3,000, reads against a single, 11Kb, sequence and find that jobs are either returning an error or taking 16-24hrs to get done. This time frame was more like 30 minutes two weeks ago. Is there a way to speed this up? Thanks in advance for any help. Andy
************************************************************
Please consider the environment. Do you really need to print this email?
The University of Dundee is a registered Scottish charity, No: SC015096
11 years, 1 month
abolut ftp setup
by Richard Liao
Dear all,
I'm a student at Fudan University in China. I was trying to enable ftp
uploading in my local version of GALAXY.
I tried the setting in
http://wiki.g2.bx.psu.edu/Admin/Config/Upload%20via%20FTP, but I got the
following problems:
1. The galaxyftp seems to be a pasql database, shall I build it myself? or
does it already exists in galaxy directory?
2. the galaxy_user seems to be a table, shall I build it myself? and how
shall I update it and integrate it with galaxy?
Thanks for your attention, and looking forward to your reply!
Best wishes
Liao Ruiqi
11 years, 1 month
GI error/shift in the output of megablast ?
by Sandrine Hughes
Dear all,
I have a trouble with the Megablast program available in NGS Mapping
and I hope that you can help. Indeed, I think that there might be a
problem with the table given in output, and notably a shift between
the GI numbers and the parameters associated.
Here are the details:
I. First, what I have done :
I used the program to identify the species that I have in a mix of
sequences by using the following options:
Database nt 27-Jun-2011
Word size 16
Identity 90.0
Cutoff 0.001
Filter out low complexity regions Yes
I run the analyses twice and obtained exactly the same results (I
used the online version of Galaxy, not a local one).
II. Second, I analysed the data obtained for one of my sequence
(1-202). The following lines are the beginning of the table that I
obtained after the megablast and two lines with troubles:
1-202 312182292 484 99.33 150 1 0 1 150
1 150 2e-75 289.0
1-202 312182201 476 99.33 150 1 0 1 150
1 150 2e-75 289.0
1-202 308228725 928 99.33 150 1 0 1 150
19 168 2e-75 289.0
1-202 308228711 938 99.33 150 1 0 1 150
22 171 2e-75 289.0
1-202 308197083 459 99.33 150 1 0 1 150
10 159 2e-75 289.0
1-202 300392378 920 99.33 150 1 0 1 150
10 159 2e-75 289.0
1-202 300392376 918 99.33 150 1 0 1 150
9 158 2e-75 289.0
1-202 300392375 922 99.33 150 1 0 1 150
11 160 2e-75 289.0
1-202 300392374 931 99.33 150 1 0 1 150
21 170 2e-75 289.0
1-202 300392373 909 99.33 150 1 0 1 150
21 170 2e-75 289.0
1-202 300392371 1172 99.33 150 1 0 1 150
9 158 2e-75 289.0
...
1-202 179366399 151762 98.67 150 2 0 1 150
46880 47029 6e-73 281.0
1-202 58617849 511 98.67 150 2 0 1 150
21 170 6e-73 281.0
III. Third, what I’ve noticed:
My first problem was that among all the species identified, two
were very different from the expected ones (2 last lines). So I
decided to search if that could be possible for this sequence and
performed independently a megablast on the NCBI with similar options.
I was not able to find these two species in the results.
So, I decided to check the hits identified in the table above and
identified a second problem. In the table, the second column give the
GI of the database hit and the third column give the length of the
database hit. However, when I manually checked in NCBI the length of
the GI, this one was incorrect. Indeed, for the GI 312182292, the
length should be 580 and not 484.
By checking different lines, I noticed that the length that is
given for a GI corresponds to the length of the GI-1. As you can see
in the above table, some GI are consecutive (300392376,
300392375,...). When checking the length of 300392376 in NCBI, I
should have 920. But when I checked 300392375, I found 918. And this
was true for the following lines : 300392374 give normally 922 and
300392373 give 931... My conclusion at that point was that there is a
shift of –1 between the GI and the other parameters of the line
(indeed the parameters for the remaining columns are in agreement with
the length of the GI-1). However, that’s not always true.... For some
GI given in the table (for example, the two last lines), if we check
the parameters of the GI-1, the parameters are completely different...
So, I suppose that there is a problem in the GI sorting during the
megablast but I’m not able to clearly define the problem.
IV. Fourth, confirmed with an other dataset
In order to be sure that the problem was not linked to my data or
my process, I asked a colleague to do a megablast on independent data.
The conclusions were similar to mine : a shift in the GI given in the
table and the parameters associated on the same line, that most of the
time but not always, correspond to GI-1.
Can you confirm that there is a problem with the output of the
megablast available in Galaxy ? If yes, do you think there is a way to
fix it ?
Thanks a lot,
Sandrine
11 years, 1 month
Help need in troubleshooting cufflinks
by Vijay T
Hi Team,
This is Vijay from ELogic Technologies Pvt Ltd, Bangalore, India.
1. Using Galaxy local
2. NA
3. Galaxy was downloaded using the tar files
4. NA
5. When i tried to install Cufflinks into my system i am getting the following error.
In file included from hits.h:21:0,
from abundances.h:20,
from differential.cpp:17:
common.h: At global scope:
common.h:25:25: error: ‘boost::BOOST_FOREACH’ has not been declared
In file included from differential.h:29:0,
from differential.cpp:18:
replicates.h: In member function ‘bool ReplicatedBundleFactory::next_bundle(HitBundle&)’:
replicates.h:141:50: warning: unused variable ‘s2’ [-Wunused-variable]
differential.cpp: In member function ‘void TestLauncher::perform_testing(std::vector<boost::shared_ptr<SampleAbundances> >&)’:
differential.cpp:111:31: warning: unused variable ‘s2’ [-Wunused-variable]
differential.cpp: In function ‘void sample_abundance_worker(const string&, SampleAbundances&, HitBundle*, bool, bool)’:
differential.cpp:704:41: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
make[2]: *** [differential.o] Error 1
make[2]: Leaving directory `/home/binet/Downloads/cufflinks-1.2.0/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/binet/Downloads/cufflinks-1.2.0'
make: *** [all] Error 2
Even i have tried with various versions of cufflinks but i am getting the same error.
Please kindly help me in getting out of this error.
Thanks and Regards
Vijay
11 years, 1 month
enabling regular users to upload large data volumes to a local Galaxy server
by Yury V Bukhman
Hi,
we are running a local Galaxy server, administered by a bioinformatics
core group. Our end users increasingly come to us with sets of large
NGS files that they can't upload to Galaxy on their own through a web
browser. We copy their data to a Galaxy filesystem and upload into data
libraries from there using the admin interface. However, the users
would prefer to be able get their data onto the server on their own.
What's the best solution to that? Should we set up FTP upload? Are
there other tricks? Any advice would be appreciated.
Thanks.
Yury
--
Yury V. Bukhman, Ph.D.
Associate Scientist, Bioinformatics Group Leader
Great Lakes Bioenergy Research Center
University of Wisconsin - Madison
445 Henry Mall, Rm. 513
Madison, WI 53706, USA
Phone: 608-890-2680 Fax: 608-890-2427
Email: ybukhman(a)glbrc.wisc.edu
11 years, 1 month
Help
by Giuseppe Petrosino
Hi,
I read on Readme for MACS that: "For the experiment with several
replicates, it is recommended to concatenate several ChIP-seq treatment
files into a single file".
Now, I have illumina ChipSeq data: two files for IP samples and two files
for Control samples. Is It right to use Concatenate datasets (text
manipulation) and then use MACS for the peaks calling?
Thank you so much.
Giuseppe
11 years, 1 month
EMBOSS fuzzpro tool pattern is not read correctly from the input
by domantas.motiejunas@cropdesign.com
Hello,
I'm trying to run EMBOSS fuzzpro tool, however, I get Illegal character
error '_', aparently from the fuzzpro tool itself.
One of the input parameters is amino acid sequence pattern, for instance I
submit AV[RL]E, but somehow it get's converted and passed to the
fuzzpro as AV__ob__RL__cb__E, and then apparently these '_' are causing
the error.
I tested the tool on command line and it works fine.
Also it works on Galaxy if I submit AVRE (just amino acid letters no
special characters for pattern)
So basically seems that in my input pattern string AV[RL]E
the character [ is somehow converted into __ob__ and character ] is
converted into __cb__
Any ideas how to fix this?
Thanks,
Domantas
11 years, 1 month
Genomic interval file for GATK
by Praveen Raj Somarajan
All,
I'm using a locally installed galaxy with GATK 1.3 beta (recently updated). I would be interested in variant calling using GATK on both Illumina and SOLiD data. My questions are:
1) What should be the format that "Genomic Interval" option can accept in beta version. It produced an error when I provided an (enrichment coords) bed file? DepthOfCoverage had also produced error when I used bed files. Would beta release (v1.3) accept bed file as input for genomic intervals?
2) SAMtool index is seem to be missing in Galaxy. Is this true or any other module (say SAM->BAM) incorporates this functionality?
Looking forward to your comments.
Raj
________________________________
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions that are unlawful. This e-mail may contain viruses. Ocimum Biosolutions has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment.
The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.
OCIMUMBIO SOLUTIONS (P) LTD
11 years, 1 month
Re: [galaxy-user] Text Editing
by Dave Clements
Hi Luce,
I'm forwarding this question to the Galaxy-User mailing list, as I think
this is a pretty common situation.
Here's how I replace text in a column. It's a two step process for each
dataset.
First go to Text Manipulation -> Compute.
In the Add expression text box enter
columnNum.replace("oldVal", "newVal")
In your case I think this is
c4.replace("MACS_peak_", "treatment1_peak_", 1)
"replace" is a Python character string operation, and c4 is the character
string column we are working on. I added the 1 out of paranoia. This
tells galaxy to only replace the first occurrence of the old string, in
each line. Care must be taken to avoid more replacement than you want.
Executing this will create a dataset with a new column at the end.
Now, use the Text Manipulation -> Cut operation to substitute the new
column in place of the old column.
Does that do the trick?
Thanks,
Dave C.
On Thu, Dec 8, 2011 at 4:24 PM, las2017 <las2017(a)med.cornell.edu> wrote:
> I have two ChIPSeq datasets, and I am trying to find the common and
> distinct peaks between them and visualize them. I end up with a MACS bed
> file for each (listing a bunch of MACS_peaks). I then use the Intersect and
> Subtract tools from the Genomic Intervals tab and end up with the peaks I
> want. However, because of the way that MACS names its peaks, there can end
> up being some peaks named the same way in both files (because, for example,
> peak 20 in file1 is from position 300,000-300,500 but peak 20 in file 2 is
> from position 320,000-320,500). So, I can end up with multiple peaks with
> the same name. Because all the peak names have the same form, it can also
> be difficult to tell them apart when visualizing them in the UCSC Genome
> Browser.
>
> What I would like to do is to be able to edit the bed file to change the
> text MACS_peak_<number> to, say, treatment1_peak_<number> so that peak 20
> would now still be numbered 20 in both files, but would have a different
> label. This would be pretty easy to do using regular expressions and sed.
>
> I know there have been a few posts about text manipulation, and I know
> that there is a text manipulation tab, but I can't seem to find an easy way
> to do what I want to do.
>
> Any advice?
>
> Thanks, luce
>
--
http://galaxyproject.org/
http://getgalaxy.org/
http://usegalaxy.org/
http://galaxyproject.org/wiki/
11 years, 1 month
Text Editing
by Jennifer Jackson
Hello Luce,
I can explain the use of the tools "Text Manipulation". For each file
independently, the following steps will rename the "name" identifier in
column 4. I don't believe that there a more direct method, but you may
discover one. This type of customization is why the tools are distinct -
so they can be used in sequence to do many of the same text
manipulations as on the unix line command. There is a biosed command as
part of EMBOSS, but that tool works on sequence text, not text files in
general.
To save time in the future, these steps can be put into a workflow, with
a edit of step 2 to customize the new ID text as needed when run.
Starting with a 5 column MACS BED file:
1 - Save the track header line with the tool "Select first lines from a
dataset" with the option to save the line 1.
2 - Create the working dataset that does not include the first line with
the tool "Remove beginning of a file" with the option "Remove first: 1"
lines.
3 - Split of up the existing ID with the tool 'Convert delimiters to
TAB" using the "Underscores" option.
This will split the fourth "name" column into three distinct columns,
the last new column will be using to create the new ID.
4 - Create a column in your file named "treatment1_peak_" with the tool
"Add column to an existing dataset"
This will create an extra column at the end of the BED file, to be used
in the new ID.
The file should now be:
c1 - chrom
c2 - start
c3 - end
c4 - the text "MACS"
c5 - the text "peak"
c6 - the text will be a number, second part of the new ID
c7 - score
c8 - the text "treatment1_peak_"
(or "treatment2_peak_" if the second file)
5 - Merge the two ID portions with the tool "Merge Columns together"
using the option of merging column c8 with c6.
This will create a new field, c9, with the text "treatment2_peak_N"
where "N" is whatever the number in c6 was, per row.
6 - Create the new BED file, putting the new "name" column in the
correct order and omitting the columns not needed, using the tool "Cut
columns from a table" and pasting into the "Cut columns:" box the this
text (no quotes):
c1,c2,c3,c9,c7
7 - Add in back the track line (removed in step 1) with the tool
"Concatenate datasets tail-to-head" with the options set to concatenate
the output of step#1 as the first file and the output of step 6 as a
second file.
8 - Use the Edit Attributes form to change the file type back to BED and
assign all five columns to the proper attribute (click on pencil icon to
reach form).
Hopefully this is will work (it did for my test) or is enough
information for you to worked out the exact steps for your particular
datasets. Next time, please send data/tool questions directly "to" the
galaxy-user(a)bx.psu.edu mailing list. Replies should be send "reply-all".
The outreach account is for other purposes.
las2017(a)med.cornell.edu wrote:
> I have two ChIPSeq datasets, and I am trying to find the common and
distinct peaks between them and visualize them. I end up with a MACS bed
file for each (listing a bunch of MACS_peaks). I then use the Intersect
and Subtract tools from the Genomic Intervals tab and end up with the
peaks I want. However, because of the way that MACS names its peaks,
there can end up being some peaks named the same way in both files
(because, for example, peak 20 in file1 is from position 300,000-300,500
but peak 20 in file 2 is from position 320,000-320,500). So, I can end
up with multiple peaks with the same name. Because all the peak names
have the same form, it can also be difficult to tell them apart when
visualizing them in the UCSC Genome Browser.
>
> What I would like to do is to be able to edit the bed file to change
the text MACS_peak_<number> to, say, treatment1_peak_<number> so that
peak 20 would now still be numbered 20 in both files, but would have a
different label. This would be pretty easy to do using regular
expressions and sed.
>
> I know there have been a few posts about text manipulation, and I
know that there is a text manipulation tab, but I can't seem to find an
easy way to do what I want to do.
>
> Any advice?
>
> Thanks, luce
--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support
11 years, 1 month