I ran a few tests and found that changing the file suffix to .txt when
using the "autodetect" upload type function speed up the loading process
considerably. As the final result is an identical Galaxy dataset to what
is produced with using the existing suffix, this is something I would
recommend that you try next time.
For my test, I took one of your files and change the suffix directly, no
other changes were made to the content, as it was already a
tab-delimited text file. I didn't continue with the testing to specify
the datatype at upload (tabular would be the correct choice), but this
is a change that may also speed up import slightly, although the .txt
suffix change was dramatic alone and the upload was quick (I ran a
side-by-side comparison of an original and .txt-suffix modified file).
The general reason behind this is that Galaxy will interpret data to
detect and confirm datatypes during upload to create associated metadata
needed for tool use. Detection is a convenience option that comes at a
cost (compute resource and time). If you can provide this information
instead, the detection portion of the process can be avoided,
confirmation and metadata creation can be started directly, and the
result is a quicker upload.
Hopefully this helps for next time,
On 5/22/12 8:49 AM, Estorninho, Megan wrote:
> Yes I am still experiencing problems. My files are only around 80-120MB and are taking hours to load if at all.
> Thanks for your help,
> Sent from my iPhone
> On 22 May 2012, at 14:38, "Jennifer Jackson"<jen(a)bx.psu.edu> wrote:
>> Hello Megan,
>> Are you still experiencing problems now? Galaxy may have been busy
>> immediately following the resolution of the cluster problem, although
>> your problem does appear to be unrelated.
>> It sounds like you are uploading file through a browser. A better choice
>> would be to use FTP. This is required for datasets approaching or
>> exceeding 2G in size.
>> Files that are< 2G, really any file over ~ 500MB, can also benefit from
>> FTP upload. An FTP client tracks the progress of an upload and can
>> resume an interrupted transfer. http://wiki.g2.bx.psu.edu/FTPUpload
>> Hopefully this helps,
>> Galaxy team
>> On 5/21/12 10:21 AM, Estorninho, Megan wrote:
>>> I have been unable to upload data files into Galaxy Main since Friday 18th May 2012. Today is my fourth day of attempting uploads. Refreshing and leaving the files to upload overnight does not work.
>>> Although Jennifer has stated the bug has been fixed at 5.30pm today I am still unable to upload data files. I thought I may be exceeding maximum file capacity but I am well below at only 1.8Gb.
>>> The Galaxy User list should be used for the discussion of
>>> Galaxy analysis and other features on the public server
>>> at usegalaxy.org. Please keep all replies on the list by
>>> using "reply all" in your mail client. For discussion of
>>> local Galaxy instances and the Galaxy source code, please
>>> use the Galaxy Development list:
>>> To manage your subscriptions to this and other Galaxy lists,
>>> please use the interface at:
>> Jennifer Jackson
hi im the new user at intalling my own galaxy
and i developing some of the tools
now i face a challenge
i need to call the login(now) email address form my tools
but i didnt find any API can did this
so i was wondering is there had any database store the email address
i need to identify the login user's email and return back to my tool
My name is Christopher Terranova and am a M.S student at the University of
Buffalo SUNY.I have been attempting to analyze my MACS data using Galaxy, already
have my custom peaks on the UCSC Genome browser and have some specific questions.
I am attempting to show how my peaks (and peak center coordinates) relate to gene
units(+/-TSS and Genic) and intergenic regions specifically. I have been
attempting to do this two different ways and am not sure if I am doing this
correctly. Below I will list the steps I have been using with particular
questions highlighted near my problem. I would also like to apologize for this
extended e-mail, however, I have only been working with Galaxy for approx a month
and attempting to figure all the manipulations is kind of difficult. If some can
answer my questions I would greatly appreciate it!!!
These questions relate specifically to promoters-
1.Retrieving TSS coordinates
1.Go to the UCSC genome browser, click "Tables" in the top of the page, and
select mouse mm9 as the organism
2.select "RefSeq genes" in tracks, BED as the "output format" and check "Send
output to galaxy"
3.click "Get output" then "Send output to galaxy", and you are redirected to
your Galaxy account, which contains an additional dataset
4.use the galaxy "Filter" tool (left column) to select all "+" strand genes
5.use the "Cut" tool (left column) to extract columns 1,2,2,4,5,6 (**is the
c2 column repeated twice??**) in order to build a BED file containing the TSS
for all "+" strand genes
6.do the same for the genes on the "-" strand
Computing peak center coordinates
1.In Galaxy, select the tool "Compute expression on every row" in the left
column (Text manipulation section)
2.as an expression, select c2+(c3-c2+1)/2, round result "YES"
3.select the dataset containing the peaks for one of the TFs (HNF4a or CBPA),
and click "execute"; this creates a new dataset with an additional column
containing the coordinate of the peak center.
4.now select the tool "Cut", and extract the columns c1,c6,c6,c4,c5(**is the
c6 column repeated twice??**) to create a new BED file containing the peak center
5.edit the metadata of this new dataset (clicking on the small pencil icon),
and change the format to BED
Computing distance to closest TSS
1.select the tool "Fetch closest non-overlapping feature", select the new
dataset containing the peak center coordinates, and the dataset containing the
mouse TSS. A new dataset is created containing for each peak, the closest TSS
2.compute the distance from the peak center to the closest TSS using the
"Compute expression on every row" tool(**what expression should I use to do this**)
3.plot the distribution using the "Histogram of a numeric column" tool.
Secondary way: I understand this is not identifying the peak center closest to
the TSS or a particular strand, however, still have a couple questions?
Now we have a data set corresponding to all human RefSeqs (34,765) and we want to
convert this set into one corresponding to human promoter regions. First, we will
make sure our data set just contains the start and end coordinates of the genes.
Select the "Text Manipulation" tool and then "Cut" colums from a table. Set "cut
columns" to "c1,c2,c3,c4,c6" (**Is this the right c1... conformation??**). Make
sure our previously downloaded RefSeq tdat set is selected and click on
"Execute". When this is finished, click on the pencil icon to assign names to the
columns. Set name to "RefSeqs", click "save" and change the data type to
"interval" and click "save". Now click the pencil icon again to define the
columns. Set the start column to "2" and the end column to "3", the strand column
to "5" and the "Name/Identifier" column to "4" and click on "save". Now, go to
the "Operate on Genomic Intervals" section of the "Tools" menu and select "Get
flanks" to get the flanking regions for the RefSeq data set we just created. Make
sure our RefSeq data set is selected and we want to get the "upstream" flanking
regions for this data set. Set the length of the flanking region to 1000 to get
the coordinates for 1kb upstream. Later on we could use different intervals.
Click on "Execute". When this has finished, go to "Operate on Genomic Intervals"
again and select "Join". Now set "First query" to "Get flanks.." and "Second
query" to the peaks file of the "MACS" output and then click on "Execute". We now
end up with 710 regions where our ChIP-Seq peaks overlap with our 1kb upstream
region (promoter region).
Lastly, while not discussed here, what exactly does the offset command do when
Thank you very much and again, I apologize for the extensive questions!
Prior mailing list Q/A archive searching:
http://wiki.g2.bx.psu.edu/Support (also has mailing list search link &
other custom google search links)
On 6/1/12 12:38 PM, Yang, Yanming wrote:
> Hi Jennifer,
> Is there an archive for questions-answers or problems-solutions back-forth emails for Galaxy, so that when I have problems/issues I can search the archive (database) first to see if they were encountered and already fixed? If there is, would you please show me the link?
> Yanming Yang, Ph.D.
> Translational Sciences Lab
> Florida State University, College of Medicine
> 1115 W Call Street, MSR 1350-M
> Tallahassee, FL 32306
> Office: 850-645-0019
I am attempting to use tophat>cufflinks>cuffmerge>cuffdiff to compare transcript expression in 3 samples (no replicates, illumina single-end reads). Using the built in UCSC mm9 reference genome I can complete the analysis just fine, with the caveat that there is no annotation.
When I repeat the analysis using the illumina igenome UCSC mm9 .gtf annotation file I get the following error in Cufflinks:
An error occurred running this job: cufflinks v1.3.0
cufflinks -q --no-update-check -I 300000 -F 0.100000 -j 0.150000 -p 8 -G /galaxy/main_pool/pool5/files/004/309/dataset_4309547.dat -N
Error running cufflinks.
return code = -11
cufflinks: /lib64/libz.so.1: no version information available
I have set the identifier/build as "Mouse July 2007 (NCBI37/mm9) (mm9)" so that does not seem to be the probelem. Suggestions as to how to amend this problem OR add annotations to the already completed analysis would be terrific.
The last few times I have tried to initiate a galaxy instance on the cloud I have gotten messages like the following:
* 18:42:04 - Master starting
* 18:42:05 - Completed initial cluster configuration.
* 18:42:09 - Prerequisites OK; starting service 'SGE'
* 18:42:20 - Configuring SGE...
* 18:42:29 - Successfully setup SGE; configuring SGE
* 18:42:29 - Saved file 'persistent_data.yaml' to bucket 'cm-26cac39701f0918ab9a9dca54f69e925'
* 18:42:29 - Saved file 'cm_boot.py' to bucket 'cm-26cac39701f0918ab9a9dca54f69e925'
* 18:42:29 - Problem connecting to bucket 'cm-26cac39701f0918ab9a9dca54f69e925', attempt 1/5
* 18:42:32 - Saved file 'cm.tar.gz' to bucket 'cm-26cac39701f0918ab9a9dca54f69e925'
* 18:42:32 - Saved file 'test.clusterName' to bucket 'cm-26cac39701f0918ab9a9dca54f69e925'
* 18:44:34 - Initializing a 'Galaxy' cluster.
* 18:44:34 - Retrieved file 'snaps.yaml' from bucket 'cloudman' to 'cm_snaps.yaml'.
* 18:45:25 - Error mounting file system '/mnt/galaxyData' from '/dev/sdg3', running command '/bin/mount /dev/sdg3 /mnt/galaxyData' returned code '32' and following stderr: 'mount: you must specify the filesystem type '
* 18:45:27 - Prerequisites OK; starting service 'Postgres'
* 18:45:27 - PostgreSQL data directory '/mnt/galaxyData/pgsql/data' does not exist (yet?)
* 18:45:27 - Configuring PostgreSQL with a database for Galaxy...
* 18:45:39 - Prerequisites OK; starting service 'Galaxy'
* 18:45:39 - Setting up Galaxy application
* 18:45:40 - Retrieved file 'universe_wsgi.ini.cloud' from bucket 'cloudman' to '/mnt/galaxyTools/galaxy-central/universe_wsgi.ini'.
* 18:45:40 - Retrieved file 'tool_conf.xml.cloud' from bucket 'cloudman' to '/mnt/galaxyTools/galaxy-central/tool_conf.xml'.
* 18:45:40 - Retrieved file 'tool_data_table_conf.xml.cloud' from bucket 'cloudman' to '/mnt/galaxyTools/galaxy-central/tool_data_table_conf.xml.cloud'.
* 18:45:40 - Starting Galaxy...
* 18:45:51 - Saved file 'persistent_data.yaml' to bucket 'cm-26cac39701f0918ab9a9dca54f69e925'
* 18:49:34 - Galaxy daemon not running.
* 18:49:34 - Galaxy service state changed from 'Starting' to 'Error'
* 18:49:35 - Saved file 'persistent_data.yaml' to bucket 'cm-26cac39701f0918ab9a9dca54f69e925'
* 18:49:41 - Galaxy daemon not running.
* 18:49:58 - Galaxy daemon not running.
* 18:50:15 - Galaxy daemon not running.
I am using 861460482541/galaxy-cloudman-2011-03-22, which is supposed to be the current version.
Thomas Randall, PhD
Bioinformatics Scientist, Contractor
National Institute of Environmental Health Sciences
P.O. Box 12233, Research Triangle Park, NC 27709
I want to fetch sequence from soybean genome, according to a gff file. My
gff3 file and genome file are attached to the email, because it is not easy
to recongnize the format if I paste it in the email. And it keeps
reporting the error:
An error occurred running this job: Traceback (most recent call last):
line 288, in <module>
if __name__ == "__main__": __main__()
Could you please tell me where is the problem?