Hi Amit, The problem has to do with the MAF files themselves - they are truncated. This occurs when MAF data is extracted from the Table browser (or any data in excess of ~100k lines). This data greatly exceeds that: *Database:*dm3 *Primary Table:*multiz15way *Row Count:*1,633,505 The ends of both dataset #1 and dataset #3 have this warning: --------------------------------------------------------------------------- procedures have exceeded timeout: 1200 seconds, function has ended. --------------------------------------------------------------------------- Instead, you have two options: 1 - obtain the MAF files from the UCSC downloads area. Go to http://genome.ucsc.edu, then in the left blue side bar select "Downloads", then navigate to the data for dm3. The multiz (MAF) will be under the Conservation track data. 2 - this same MAF data is cashed as a local data source on usegalaxy.org. If you queried the blocks using an interval/bed file of coordinates (assigned with database "dm3"), you could obtain the intervals that way. UCSC has the chromsome names and lengths on the D. Mel home page in a table found by clicking on the link near the top named "Sequences". Simple files from this info can be pasted into the "Get Data -> Upload file" tool form to create one-line query datasets, like this one: I was able to run " Extract MAF blocks" then "MAF to Interval" with no problems. I don't know if doing this one chromosome at time is required, but it certainly will work and not exceed any resources. I suggested doing this once, building a workflow, then running on the rest in batch. Hopefully one of these options works out for you! Jen Galaxy team On 5/16/14 6:59 AM, Amit Pande wrote:
Dear Jennifer,
I uploaded the data both the ways i.e via the FTP and through the UCSC browser, but all the attempts to extract MAF blocks between insect species has failed. I need your help in this regard, so I have shared my history with you. warm regards, Amit.
On Thu, May 15, 2014 at 5:07 PM, Jennifer Jackson <jen@bx.psu.edu <mailto:jen@bx.psu.edu>> wrote:
Hi Amit,
This is occurring when you are uploading a MAF "multizXXway" file obtained from UCSC downloads (genome.ucsc.edu <http://genome.ucsc.edu>) to Galaxy main (usegalaxy.org <http://usegalaxy.org>)? Upload using FTP? https://wiki.galaxyproject.org/Support#Loading_data
The table browser is generally a poor choice to extract more than a few regions with MAF data (per query) as there are limits on how many lines of output will be sent over. Incomplete transfers are a common. This error could be related to a format or datatype assignment issue from that type of issue.
Please give FTP loading a try if you have not already. Then if problems continue, you can share a history link with me. Note which dataset was the MAF uploaded via FTP. This is how to share: https://wiki.galaxyproject.org/Learn/Share
Best,
Jen Galaxy team
Going forward, please ask questions on our new forum that is replacing this list (very soon now): https://wiki.galaxyproject.org/Support#Biostar
On 5/14/14 10:52 PM, Amit Pande wrote:
Dear Galaxy,
I am trying to import a multiz alignment file for all the insect species from the UCSC genome browser. Galaxy does not recognize number of blocks in the multiz file as there is a question mark in the file format view (? blocks). Then when I am trying to use the tool ( Extract MAF blocks <https://usegalaxy.org/tool_runner?tool_id=Interval2Maf1> given a set of genomic intervals) then there is an error saying
"An error occurred with this dataset:191757 MAF blocks converted to Genomic Intervals for species dm3. There was a problem processing your input: exceptions must be old-style classes or derived from BaseException, not str" and even when the tool runs it shows the following message " This is a new dataset and not all of its data are available yet "
Please look into the problem.
warm regards, Amit.
___________________________________________________________ The Galaxy User List is being replaced by the Galaxy Biostar User Support Forum athttps://biostar.usegalaxy.org/
Posts to this list will be disabled in May 2014. In the meantime, you are encouraged to post all new questions to Galaxy Biostar.
For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
-- Jennifer Hillman-Jackson http://galaxyproject.org
-- Jennifer Hillman-Jackson http://galaxyproject.org