Re: [galaxy-user] galaxy-user Digest, Vol 46, Issue 5

8 Apr 2010


      I find the only reliable way to upload files >1Gb is via a URL.

Ian


Quoting Peter Andrews <Peter.Andrews@dartmouth.edu>:
...
I also find that uploading a file rarely works -- it just displays  
the loading up blue arrow forever.
On 4/7/2010 9:23 PM, galaxy-user-request@lists.bx.psu.edu wrote:
...
Send galaxy-user mailing list submissions to
  galaxy-user@lists.bx.psu.edu
To subscribe or unsubscribe via the World Wide Web, visit
  http://lists.bx.psu.edu/listinfo/galaxy-user
or, via email, send a message with subject or body 'help' to
  galaxy-user-request@lists.bx.psu.edu
You can reach the person managing the list at
  galaxy-user-owner@lists.bx.psu.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of galaxy-user digest..."
Today's Topics:
1. Re: Upload file (Daniel Blankenberg)
   2. Re: Upload file (Yanwei Tan)
   3. Re: Upload file (Daniel Blankenberg)
   4. Problem w/Cheetah, repeating dataset input (Jesse Erdmann)
   5. Re: Quality based trimming (Florent Angly)
----------------------------------------------------------------------
Message: 1
Date: Wed, 7 Apr 2010 09:53:19 -0400
From: Daniel Blankenberg<dan@bx.psu.edu>
To: Yanwei Tan<Tan@nbio.uni-heidelberg.de>
Cc:galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Upload file
Message-ID:<37031484-8E2E-4A9C-A027-D0D75F8D0799@bx.psu.edu>
Content-Type: text/plain; charset=us-ascii
Hi Wei,
3Gb is not that large of a file and should upload without issue.  
How are you uploading the file? - May I suggest that you try  
placing the file in a web-accessable location and provide the url  
to the upload tool in the paste box.
Galaxy does accept the upload of individual gzipped files which can  
greatly reduce the time required for the transfer.
Thanks,
Dan
On Apr 6, 2010, at 6:04 PM, Yanwei Tan wrote:
...
Hi Dan,
Here I met a problem regarding the speed of upload file into  
Galaxy. Since my data is around 3G fastq file, I uploaded two days  
ago, the uploading still did not finish yet. I was wondering if I  
should try the zip compressed file.
Many thanks!
Wei
On 4/6/10 5:00 PM, Daniel Blankenberg wrote:
...
Hi Florent,
Thanks very much for the comments. A sliding window sounds like  
an excellent approach: allow users to specify the window size,  
step size, an aggregation action to perform on the window (min,  
max, sum, mean, etc ), a comparison method (<,<=, ==, etc) and a  
threshold quality value; allowing users to specify the ends (both  
or only one or the other) to trim would also likely be useful.  
Would it also be desirable to allow specifying a number of  
quality scores that can be excluded from the aggregation action  
(the zero low quality base pairs in your example)? A window size  
of 1 would handle the simple case of only trimming the very ends  
while allowing the user to configure more complex windowing  
schemes. Thoughts?
Thanks,
Dan
On Apr 6, 2010, at 4:00 AM, Florent Angly wrote:
...
Thanks for your reply Daniel.
...
You are correct that there is not currently a tool to trim  
directly by quality in Galaxy; currently the the Summary  
statistics and boxplot tools are used to determine good cut off  
for use in the trim by column tool; percentage of read length  
can be more useful on variable length reads. However, adding a  
tool that can directly trim reads based upon a threshold  
quality score seems like a natural fit for Galaxy, when uniform  
read length is not present at the start and/or not a  
requirement at the end and the percentage-of-read-length method  
is not sufficient
That's right... I did not even think about using the boxplot  
tool to find how much to trim the ends. My reads all have the  
same length, but still, is seems more natural to only trim as  
much as needed and no more. For example, I have some reads that  
are completely low quality and should entirely trimmed/removed,  
whereas some might of good quality over almost all their length.
...
Lets verify that you are looking for something like this, where  
'x' is a low quality base and 'o' is a high quality base:
Start with:
xxxooooxxooooxxx
after trimming ends for 'x':
   ooooxxoooo
So that trimming happens only from the ends and stops as soon  
as a base above the threshold is found and internal low quality  
bases are not considered.
It's probaby better to use a short sliding window (of, say, 5  
bp) and trim the ends until the window has no more than, say  
zero low quality base pairs. So, the following sequence would be  
converted from:
   xxxoxooooooxxooooooxoxxx
to:
            ooooooxxoooooo
Florent
_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user
_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user
-- 
Yanwei Tan
Institute of Neurobiology
1.OG, AG Bading
Im Neuenheimer Feld 364
University of Heidelberg
69120 Heidelberg
Germany
Tel:+49-6221-548319
Fax:+49-6221-546700
------------------------------
Message: 2
Date: Wed, 07 Apr 2010 17:21:37 +0200
From: Yanwei Tan<Tan@nbio.uni-heidelberg.de>
To: Daniel Blankenberg<dan@bx.psu.edu>
Cc:galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Upload file
Message-ID:<4BBCA301.80907@nbio.uni-heidelberg.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Dan,
Many thanks for your reply. I just upload the file from my computer and
browse the location. If I upload the zip compressed file, should I
choose txtseq.zip format? My data is fastq file in txt file. And how
could I put the file in a web-accessible location? You mean the FTP server?
Thanks for your help!
Wei
On 4/7/10 3:53 PM, Daniel Blankenberg wrote:
...
Hi Wei,
3Gb is not that large of a file and should upload without issue.  
How are you uploading the file? - May I suggest that you try  
placing the file in a web-accessable location and provide the url  
to the upload tool in the paste box.
Galaxy does accept the upload of individual gzipped files which  
can greatly reduce the time required for the transfer.
Thanks,
Dan
On Apr 6, 2010, at 6:04 PM, Yanwei Tan wrote:
...
Hi Dan,
Here I met a problem regarding the speed of upload file into  
Galaxy. Since my data is around 3G fastq file, I uploaded two  
days ago, the uploading still did not finish yet. I was wondering  
if I should try the zip compressed file.
Many thanks!
Wei
On 4/6/10 5:00 PM, Daniel Blankenberg wrote:
...
Hi Florent,
Thanks very much for the comments. A sliding window sounds like  
an excellent approach: allow users to specify the window size,  
step size, an aggregation action to perform on the window (min,  
max, sum, mean, etc ), a comparison method (<,<=, ==, etc) and a  
threshold quality value; allowing users to specify the ends  
(both or only one or the other) to trim would also likely be  
useful. Would it also be desirable to allow specifying a number  
of quality scores that can be excluded from the aggregation  
action (the zero low quality base pairs in your example)? A  
window size of 1 would handle the simple case of only trimming  
the very ends while allowing the user to configure more complex  
windowing schemes. Thoughts?
Thanks,
Dan
On Apr 6, 2010, at 4:00 AM, Florent Angly wrote:
...
Thanks for your reply Daniel.
> You are correct that there is not currently a tool to trim  
> directly by quality in Galaxy; currently the the Summary  
> statistics and boxplot tools are used to determine good cut  
> off for use in the trim by column tool; percentage of read  
> length can be more useful on variable length reads. However,  
> adding a tool that can directly trim reads based upon a  
> threshold quality score seems like a natural fit for Galaxy,  
> when uniform read length is not present at the start and/or  
> not a requirement at the end and the percentage-of-read-length  
> method is not sufficient
>
>
>
>
That's right... I did not even think about using the boxplot  
tool to find how much to trim the ends. My reads all have the  
same length, but still, is seems more natural to only trim as  
much as needed and no more. For example, I have some reads that  
are completely low quality and should entirely trimmed/removed,  
whereas some might of good quality over almost all their length.
> Lets verify that you are looking for something like this,  
> where 'x' is a low quality base and 'o' is a high quality base:
> Start with:
> xxxooooxxooooxxx
> after trimming ends for 'x':
>     ooooxxoooo
> So that trimming happens only from the ends and stops as soon  
> as a base above the threshold is found and internal low  
> quality bases are not considered.
>
>
>
>
It's probaby better to use a short sliding window (of, say, 5  
bp) and trim the ends until the window has no more than, say  
zero low quality base pairs. So, the following sequence would  
be converted from:
    xxxoxooooooxxooooooxoxxx
to:
             ooooooxxoooooo
Florent
_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user
_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user
------------------------------
Message: 3
Date: Wed, 7 Apr 2010 11:32:05 -0400
From: Daniel Blankenberg<dan@bx.psu.edu>
To: Yanwei Tan<Tan@nbio.uni-heidelberg.de>
Cc:galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Upload file
Message-ID:<54E8CB9D-5C36-4A20-8C8A-188B3F908012@bx.psu.edu>
Content-Type: text/plain; charset=us-ascii
Hi Wei,
To upload a gzipped file (e.g. somefile.fastq.gz), just upload as  
you normally would (don't set to txtseq.zip - this is a special  
format), the file will be ungzipped automatically; zip files are  
not yet supported.
As far as a web-accessible location, you can specify http(s) and  
ftp locations by entering e.g.http://someserver.org/somefile.fastq   
orftp://someserver.org/somefile.fastq  into the 'paste' box in the  
upload tool, if you have access this type of resource.
Thanks,
Dan
On Apr 7, 2010, at 11:21 AM, Yanwei Tan wrote:
...
Hi Dan,
Many thanks for your reply. I just upload the file from my  
computer and browse the location. If I upload the zip compressed  
file, should I choose txtseq.zip format? My data is fastq file in  
txt file. And how could I put the file in a web-accessible  
location? You mean the FTP server?
Thanks for your help!
Wei
On 4/7/10 3:53 PM, Daniel Blankenberg wrote:
...
Hi Wei,
3Gb is not that large of a file and should upload without issue.  
How are you uploading the file? - May I suggest that you try  
placing the file in a web-accessable location and provide the url  
to the upload tool in the paste box.
Galaxy does accept the upload of individual gzipped files which  
can greatly reduce the time required for the transfer.
Thanks,
Dan
On Apr 6, 2010, at 6:04 PM, Yanwei Tan wrote:
...
Hi Dan,
Here I met a problem regarding the speed of upload file into  
Galaxy. Since my data is around 3G fastq file, I uploaded two  
days ago, the uploading still did not finish yet. I was  
wondering if I should try the zip compressed file.
Many thanks!
Wei
On 4/6/10 5:00 PM, Daniel Blankenberg wrote:
...
Hi Florent,
Thanks very much for the comments. A sliding window sounds like  
an excellent approach: allow users to specify the window size,  
step size, an aggregation action to perform on the window (min,  
max, sum, mean, etc ), a comparison method (<,<=, ==, etc) and  
a threshold quality value; allowing users to specify the ends  
(both or only one or the other) to trim would also likely be  
useful. Would it also be desirable to allow specifying a number  
of quality scores that can be excluded from the aggregation  
action (the zero low quality base pairs in your example)? A  
window size of 1 would handle the simple case of only trimming  
the very ends while allowing the user to configure more complex  
windowing schemes. Thoughts?
Thanks,
Dan
On Apr 6, 2010, at 4:00 AM, Florent Angly wrote:
> Thanks for your reply Daniel.
>
>
>
>> You are correct that there is not currently a tool to trim  
>> directly by quality in Galaxy; currently the the Summary  
>> statistics and boxplot tools are used to determine good cut  
>> off for use in the trim by column tool; percentage of read  
>> length can be more useful on variable length reads. However,  
>> adding a tool that can directly trim reads based upon a  
>> threshold quality score seems like a natural fit for Galaxy,  
>> when uniform read length is not present at the start and/or  
>> not a requirement at the end and the  
>> percentage-of-read-length method is not sufficient
>>
>>
>>
>>
> That's right... I did not even think about using the boxplot  
> tool to find how much to trim the ends. My reads all have the  
> same length, but still, is seems more natural to only trim as  
> much as needed and no more. For example, I have some reads  
> that are completely low quality and should entirely  
> trimmed/removed, whereas some might of good quality over  
> almost all their length.
>
>
>
>
>> Lets verify that you are looking for something like this,  
>> where 'x' is a low quality base and 'o' is a high quality base:
>> Start with:
>> xxxooooxxooooxxx
>> after trimming ends for 'x':
>>    ooooxxoooo
>> So that trimming happens only from the ends and stops as soon  
>> as a base above the threshold is found and internal low  
>> quality bases are not considered.
>>
>>
>>
>>
> It's probaby better to use a short sliding window (of, say, 5  
> bp) and trim the ends until the window has no more than, say  
> zero low quality base pairs. So, the following sequence would  
> be converted from:
>    xxxoxooooooxxooooooxoxxx
> to:
>             ooooooxxoooooo
>
> Florent
>
> _______________________________________________
> galaxy-user mailing list
> galaxy-user@lists.bx.psu.edu
> http://lists.bx.psu.edu/listinfo/galaxy-user
>
>
>
_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user
------------------------------
Message: 4
Date: Wed, 7 Apr 2010 16:25:37 -0500
From: Jesse Erdmann<jerdmann@umn.edu>
To:galaxy-user@bx.psu.edu
Subject: [galaxy-user] Problem w/Cheetah, repeating dataset input
Message-ID:
  <m2h97174eb21004071425o9cba9384i648744371304d210@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Hi all,
I'm using the following XML:
...
<inputs>
  <param type="data" format="txt" name="seqmeta" label="PAPS.seqmeta"/>
  <param type="data" format="txt" name="lib_info" label="Label Info"/>
  <repeat name="to_merge" title="Evaluation Inputs and Results">
  <param type="select" name="pmm" label="Mismatch %" help="Maximum
% of mismatch between the construct sequence and the comparison
sequence.">
      	<option value="10">10%</option>
      	<option value="15">15%</option>
      	<option value="20">20%</option>
  </param>
  <param type="text" name="max_length" label="Max Construct Length"
value="25" />
      <param format="fasta" name="fasta_in" type="data" label="FastA Output"/>
      <param format="txt" name="seq_stats" type="data"  
label="Sequence Stats"/>
  </repeat>
  </inputs>
 ...
<configfiles>
  <configfile name="file_info">
      #for $merge_set in enumerate( $to_merge )
      	#silent sys.stderr.write($merge_set.__str__() + "\n")
                        #silent  
sys.stderr.write($merge_set.fasta_in.__str__())
      #end for
       </configfile>
  </configfiles>
 ...
And getting the resulting output:
127.0.0.1 - - [07/Apr/2010:15:31:50 -0500] "POST /admin/tool_reload
HTTP/1.1" 200 -"http://localhost:8080/admin/reload_tool"  "Mozilla/5.0
(Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.2) Gecko/20100316
Firefox/3.6.2"
127.0.0.1 - - [07/Apr/2010:15:31:59 -0500] "POST /tool_runner/index
HTTP/1.1" 200 -"http://localhost:8080/tool_runner/index"  "Mozilla/5.0
(Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.2) Gecko/20100316
Firefox/3.6.2"
galaxy.jobs.schedulingpolicy.roundrobin DEBUG 2010-04-07 15:32:03,765
RoundRobin queue: user/session did not exist, created new jobqueue for
session = 6
galaxy.jobs DEBUG 2010-04-07 15:32:03,766 job 234 put in policy queue
galaxy.jobs.schedulingpolicy.roundrobin DEBUG 2010-04-07 15:32:03,766
RoundRobin queue: retrieving job from job queue for session = 6
galaxy.jobs DEBUG 2010-04-07 15:32:03,766 dispatching job 234 to  
local runner
galaxy.jobs DEBUG 2010-04-07 15:32:05,423 job 234 dispatched
(0, {'seq_stats':<galaxy.tools.DatasetFilenameWrapper object at
0x572b390>, '__index__': 0, 'max_length':
<galaxy.tools.InputValueWrapper object at 0x572b8d0>, 'fasta_in':
<galaxy.tools.DatasetFilenameWrapper object at 0x572b810>, 'pmm':
<galaxy.tools.SelectToolParameterWrapper object at 0x572b290>})
127.0.0.1 - - [07/Apr/2010:15:32:03 -0500] "GET /history HTTP/1.1" 200
-"http://localhost:8080/tool_runner/index"  "Mozilla/5.0 (Macintosh;
U; Intel Mac OS X 10.5; en-US; rv:1.9.2.2) Gecko/20100316
Firefox/3.6.2"
galaxy.jobs.runners.local ERROR 2010-04-07 15:32:07,356 failure  
running job 234
Traceback (most recent call last):
  File  
"/Users/jerdmann/Python/galaxy-central/lib/galaxy/jobs/runners/local.py",
line 55, in run_job
    job_wrapper.prepare()
  File "/Users/jerdmann/Python/galaxy-central/lib/galaxy/jobs/__init__.py",
line 386, in prepare
    config_filenames = self.tool.build_config_files( param_dict,
self.working_directory )
  File "/Users/jerdmann/Python/galaxy-central/lib/galaxy/tools/__init__.py",
line 1364, in build_config_files
    f.write( fill_template( template_text, context=param_dict ) )
  File "/Users/jerdmann/Python/galaxy-central/lib/galaxy/util/template.py",
line 9, in fill_template
    return str( Template( source=template_text, searchList=[context] ) )
  File  
"/Users/jerdmann/Python/galaxy-central/eggs/Cheetah-2.2.2-py2.5-macosx-10.3-fat-ucs2.egg/Cheetah/Template.py",
line 1004, in __str__
    return getattr(self, mainMethName)()
  File "cheetah_DynamicallyCompiledCheetahTemplate_1270672326_31_15888.py",
line 86, in respond
NotFound: cannot find 'fasta_in' while searching for 'merge_set.fasta_in'
So, I'm printing the merge_set dictionary and seeing a fasta_in key,
but when I try to use fasta_in it can not be found.  I'm assuming this
is probably a syntax oversight on my part, but I'm not seeing it.
-- 
--------------
Peter Andrews
Programmer
Computational Genetics Lab
Dartmouth Hitchcock Medical Center
(603) 653-9963

Re: [galaxy-user] galaxy-user Digest, Vol 46, Issue 5

Ian Donaldson