user and dataset management with LDAP: some questions
by Louise-Amélie Schmitt
Hello everyone
I have a couple of questions regarding user and dataset management.
1) We use LDAP for user registration and logging, would it be possible
to retrieve automatically the LDAP users' groups and create the groups
in Galaxy accordigly? (and of course put the user in their respective
groups)
2) Is it possible, still using LDAP, to delete a user and all his
datasets?
3) Is it possible to automatically suppress, for instance, any dataset
that was added more than a couple of months ago?
4) Is there a not-so-intrusive way to add a column in the user list
showing the disk space they use respectively?
5) I tried to see how the API works but i have to admit I didn't get a
thing. I read the scripts/api/README file and there I saw one needs to
access the user preferences to generate an API key. What is its purpose?
Is there a way to do it when you use LDAP (therefore no access to that
key generator) ?
Sorry, this is a bit random but I'm kinda drowning here, since I'm not
used to manipulating apps this huge. Thanks for your help and patience.
Cheers,
L-A
10 years, 11 months
downloading bam and bai files from sam-to-bam wrapper
by Ryan Golhar
In my local instance of Galaxy, I'm trying to download the BAM and BAI
file from the tool SAM-to-BAM. I see the links for downloading the
Dataset and bam_index.
BTW - I'm running Safari on Mac OS X.
When I download the Dataset, I get the BAM file. When I download the
bam_index, it gets renamed to Galaxy13-[SAM-to-BAM_on...].bam_index.html
The error is the extension, .bam_index.html. This should be .bam.bai.
The BAM file gets saved correctly with .bam.
I check the sam-to-bam XML wrapper but can't determine where this
filename comes from or how to fix it. Has anyone else run into this?
--
CONFIDENTIALITY NOTICE: This email communication may contain private,
confidential, or legally privileged information intended for the sole
use of the designated and/or duly authorized recipient(s). If you are
not the intended recipient or have received this email in error, please
notify the sender immediately by email and permanently delete all copies
of this email including all attachments without reading them. If you are
the intended recipient, secure the contents in a manner that conforms to
all applicable state and/or federal requirements related to privacy and
confidentiality of such information.
10 years, 11 months
Multiple galaxy instances
by Jean-Baptiste Denis
Hello everybody,
i'm in the process to provide Galaxy for multiple team. I've already
setup a testing instance using the production setup page on the wiki
(apache + sge) and it works quite well if i'm refearing to the users
feedback. This setup is typically use for NGS dealing with data on a NFS
share without uploading them to the instance.
Why do i need multiple instance ? Maybe i'm not using galaxy correctly.
Correct me if i'm wrong.
My goal is to delagate the management of library/datasets to a galaxy
admin of each team from the beginning : i do NOT want a SINGLE
independant super admin to manage the access for multiple team, it
doesn't scale.
The galaxy instance and the underlying "galaxy" system user must access
to the NGS data on the NFS (v3) share. This means that the galaxy user
must be in a group that has access to the data. I can delegate the
process of managing datasets and library to a dedicated galaxy admin.
This setup is working quite well with the single instance setup. My job
as a sysadmin is reduced to galaxy setup and maintenance : i'm not
involved in the library/dataset management.
The problem with this setup does not work if there is another team with
data they don't want to share with others (don't blame me on that) : the
galaxy system user must access the data of the first team AND the second
team, this means that the galaxy admin of each team could access everything.
One solution to this problem would be to have an independant galaxy
super admin with access to everything which manage data access to each
team. I don't like this solution, like i said, it doesn't scale.
So, another way to deal with that is to give each team its own galaxy
instance (each running with a specific system galaxy user) with a
dedicated galaxy admin. Two possibility :
- N galaxy tree, each with a different tuned universe_wsgi.ini init file
(dedicated path, port, database, etc...). The problem here is on the
sysadmin side : the update process effort must be repeated N times.
- A unique galaxy tree, and N tuned (dedicated path, port, database,
etc...) universe_wsgi.ini files. This seems the best to me but i need to
know if galaxy internals can managed that kind of setup ?
What do you think ? Any inputs, remarks or advices are welcome !
Regards,
Jean-Baptiste
10 years, 11 months
tmp directory not cleaned up?
by Ryan Golhar
I just noticed a lot of files in my galaxy tmp directory. Since this
isn't a system tmp directory, the system cron scripts don't clean it up.
Is there a galaxy cron script that can be used to clean up this directory?
--
CONFIDENTIALITY NOTICE: This email communication may contain private,
confidential, or legally privileged information intended for the sole
use of the designated and/or duly authorized recipient(s). If you are
not the intended recipient or have received this email in error, please
notify the sender immediately by email and permanently delete all copies
of this email including all attachments without reading them. If you are
the intended recipient, secure the contents in a manner that conforms to
all applicable state and/or federal requirements related to privacy and
confidentiality of such information.
10 years, 11 months
Reload ".loc" files without restarting Galaxy system?
by Luobin Yang
Hi,
The "reload a tool's configuration" menu avoids restarting Galaxy system
when a tool's web interface changed, but it doesn't seem to reload the
".loc" files that a tool's XML file uses. I am wondering if it's possible to
reload the ".loc" file also?
Thanks,
Luobin
11 years
Filter data on any column and missing values
by Peter Cock
Hi all,
I have just found a problem using the "Filter data on any column using
simple expressions" tool, i.e. files tools/stats/filters.xml and
tools/stats/filters.py
I have some six column tabular like this, where I have used \t for a
tab, and \n for the new lines:
#ID\tHMM_Sprob_score\tSP_len\tRXLR_start\tEER_start\tRXLR?\n
gi|301087619|ref|XP_002894699.1|\t0.990\t21\t54\t64\tY\n
gi|301087623|ref|XP_002894700.1|\t0.997\t23\t\t\tN\n
gi|301087628|ref|XP_002894701.1|\t0.000\t24\t\t\tN\n
Breakdown of my data:
Column 1 - ID, mandatory string
Column 2 - HMM_Sprob_score, mandatory float
Column 3 - SP_len, mandatory integer
Column 4 - RXLR_start, optional integer
Column 5 - EER_start, optional integer
Column 6 - RXLR?, mandatory string (Y or N)
Notice that in my output columns 4 and 5 can be empty or an integer.
I'm trying to filter this file using c6=='Y', i.e. column six is a
yes. This works (one row output) but Galaxy tells me:
Info: Filtering with c6=='Y',
kept 100.00% of 4 lines.
Skipped 3 invalid lines starting at line #1: "#ID HMM_Sprob_score
SP_len RXLR_start EER_start RXLR?"
Then if I try to filter using c6=='N', i.e. column six is a no, it
fails to work (zero rows of output instead of three) and tells me:
kept 0.00% of 4 lines.
Skipped 3 invalid lines starting at line #1: "#ID HMM_Sprob_score
SP_len RXLR_start EER_start RXLR?"
Digging into the code, tools/stats/filters.py gets given the list of
column types from Galaxy and (regardless of which columns are to be
used) attempts to cast them to integers, floats, etc.
It looks like Galaxy has decided that my columns 4 and 5 are integers
(based on the first row), and therefore filters.py blindly tries to
using int(...) on all these entries and that fails on the empty cells.
I see several issues,
(a) The filters.py tool only really needs to cast those columns being
used for the filter (fairly easy to fix)
(b) The galaxy column type detection seems a bit fragile (hard to
really fix without looking at all the data).
(c) Are there other tools that would break in a similar way to filter.py?
Peter
11 years
Upload fails, webapp and job runner running on different machines
by Louise-Amélie Schmitt
Hi everyone,
I'm currently trying a new galaxy install with the webapp and the job
runner running on different machines, sharing a nfs volume where the
galaxy files are, and another one where the data is supposed to be
stored, as specified in the file_path and new_file_path values in the
universe_wsgi files. (one for the webapp and one for the runner, as
stated in the doc)
Both instances run properly with no error message but when I want to
upload a file from wherever in a library, here's what happens:
The file's row appears in the library with the "uploading" note.
The job is registered in the database.
And that's all. Nothing changes afterwards. Nothing is saved anywhere.
This can't be a file-too-big issue since the test file I'm working with
weighs 3.5kB.
I probably stupidly missed something but I really don't see where the
problem might be.
Any idea?
Cheers
L-A
11 years
Using wget/curl with Tool Shed?
by Peter Cock
Hi all,
I'm trying to download and install something from the Galaxy
Tool Shed http://community.g2.bx.psu.edu/ on to our sever.
To do this, I'd like to copy/paste the URL of a tool's tar ball
(or individual XML file etc) from my browser running on my
desktop to a terminal window logged in to our server to give
to wget or curl.
e.g. This URL will give me bam_to_bigwig_0.0.1.tar.gz if
run in a web browser:
http://community.g2.bx.psu.edu/common/download_tool?cntrller=tool&id=3b16...
However, the URL actually goes to an HTML page then
does some javascript magic to redirect to the real file.
I can't easily use this URL with wget or curl.
Is this something that you could fix in the website code?
For now I can just download the file on my desktop, then
scp it to the server (or ssh -X into the server and run a
browser from there), but this is an unnecessary hurdle
I feel.
Regards,
Peter
11 years
Community server bug and request
by Assaf Gordon
Tiny bug:
When viewing a file inside a tarball, the content-type is forced to "text/plain".
Works for most files, but not with image files (example: the "Sequence Logo" tool has a jpg image in the tarball, the browser will display it as text/plain with binary characters).
This is done in "./lib/galaxy/webapps/community/controllers/tool.py", function "view_tool_file", line 206:
trans.response.set_content_type( 'text/plain' )
regardless of the file type.
The complicated solution would be to use the "mimetype" library (http://docs.python.org/library/mimetypes.html) or python-magic (https://github.com/ahupp/python-magic), but a simpler workaround would be to just check the file name for "jpg/png/gif" extensions and change the content-type appropriately.
Feature request:
A way to easily link to tools, so I can send to them to other people or point them in the right direction.
Example:
Ross's Web logo tool (great tool, BTW), If I want to tell where it is, the link is:
http://community.g2.bx.psu.edu/tool/browse_tools?sort=name&f-state=approv...
or direct downloading the tarball is:
http://community.g2.bx.psu.edu/common/download_tool?cntrller=tool&id=c2f6...
Both are probably not stable links (if a new version is uploaded).
-gordon
11 years
"Hide Datasets" bug in workflow
by Assaf Gordon
Hi,
There's a small bug with the "hide dataset" button in the workflow editor - once any dataset is marked as "output" (by clicking on the star icon), there's no way to show all the datasets, even if the user manually un-checked every star icon for every workflow step.
meaning:
If I have a workflow that *used* to have one step marked as "output", unmarking it (so that no dataset is marked is "output", implying all datasets should be retained in the history) doesn't work - galaxy insists on hiding all the other datasets.
Steps to reproduce:
1. Import this workflow:
http://main.g2.bx.psu.edu/u/foobar/w/hiddendatasetstest2
2. Edit this workflow.
Notice that only the last two steps are marked as "output".
don't change anything, close with (or without) saving.
3. Run this workflow.
Notice that Steps 1,2,3,5,6 have "Action: Hide this dataset." - as expected.
Only steps 4,7 are marked as "output".
No need to actually execute the workflow.
4. Edit the workflow again.
Remove the "output" mark from steps 4,7 ("Add Column" and "compute").
No dataset is marked as output - at least from the user POV,
it's implied that all datasets should be kept in the history (just like when editing a new workflow).
Save the workflow, close the workflow.
5. Run this workflow.
Notice that steps 1,2,3,5,6 STILL have "Action: hide this dataset" - this is the bug.
So the only workaround is to run the workflow, then "extract workflow from history" to get a "clean" copy of the workflow without any datasets hidden.
Thanks to Marek Kudla for meticulously experimenting with this issue.
-gordon
11 years