Hi Galaxy team,
The following is just a bunch of thoughts after using Galaxy for a while and which might be interesting for future developments...
1. Interface consistency: "Save"
* There are three nice icons at the top of all my dataset items in the history panel on the right for view, edit and delete. So why is there no save icon at the same location instead of a link further down? * When I edit a workflow there is a save button above the canvas and there is another on in the panel on the right when I edit the properties of a specific workflow item. As far as I can tell these buttons are not completely redundant, but why do I need two save buttons?
2. Provenance data
* Reproducibility is important and it is nice that Galaxy automatically captures your analysis in histories, but if I want to have a second look at my data after let's say a few months to figure out what I did exactly and how a certain combination of data and tools produced a certain result. Hence if I for example executed a workflow once every two weeks on updated data for many months I might want to retrieve the history for a certain version of a database. So I might want to say give me the histories containing datasets tagged as Ensembl version 48, or UniProt 3 or some version of a reference assembly, etc. Or I might want to see how the results changed for a certain gene over time as result of updated databases and /or tools. So I might want to say to Galaxy show me all histories containing ENSGALG000012589 or NM_45689725. Hence, I'd love to be able to search histories. In addition to make it a bit easier to trace thing in browse mode it would be nice if the date a history was last modified would be visible. Currently I only have the age of the history in minutes, hours or days. That is convenient for recent items, but for things that are longer ago a date makes more sense to me....
* There is a fixed "Database/Build" popup that I can use to tag my data sets, but this feels artificially limited. Is there any reason why the species and database version cannot be separate items? If there would be a popup first to select a species followed by a second popup to select the genome assembly version, the lists could be a lot smaller and hence easier to navigate. In addition there are cases where I do have a species, but don't have an assembly or where there are additional version numbers to keep track of. For example I have lots of Ensembl data. Ensembl does not have a single version number, but 3 version numbers. There is one for the database schema, one for the assembly and one for the annotation/ genebuild. The curent version for mouse is for example: 55 37 h, where 55 is the release and schema version number, 37 the assembly and "h" the version of the gene build. In addition I recently moved to a proteomics group and might want to capture DB version numbers for species without a reference assembly. for example I might know the species name and the fact I'm using UniProt 15.5... but currently I cannot easily capture that in a consistent way. (I know I might add this to the "info" for a dataset, but it's free text, with all kinds of possible spelling variants as a result...)
Cheers,
Pi
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
Hello Pieter,
Thanks very much for your proposals - information like this is extremely valuable to us, and helps define our development road map. I have opened the following tickets for each of the items you have proposed, so you can "follow" them in bitbucket if you want.
http://bitbucket.org/galaxy/galaxy-central/issue/107/interface-consistency-s... http://bitbucket.org/galaxy/galaxy-central/issue/108/proposed-enhancements-f... http://bitbucket.org/galaxy/galaxy-central/issue/109/split-selection-for-db-...
Thanks again,
Greg Von Kuster Galaxy Development Team
Pieter Neerincx wrote:
Hi Galaxy team,
The following is just a bunch of thoughts after using Galaxy for a while and which might be interesting for future developments...
- Interface consistency: "Save"
- There are three nice icons at the top of all my dataset items in the
history panel on the right for view, edit and delete. So why is there no save icon at the same location instead of a link further down?
- When I edit a workflow there is a save button above the canvas and
there is another on in the panel on the right when I edit the properties of a specific workflow item. As far as I can tell these buttons are not completely redundant, but why do I need two save buttons?
- Provenance data
- Reproducibility is important and it is nice that Galaxy
automatically captures your analysis in histories, but if I want to have a second look at my data after let's say a few months to figure out what I did exactly and how a certain combination of data and tools produced a certain result. Hence if I for example executed a workflow once every two weeks on updated data for many months I might want to retrieve the history for a certain version of a database. So I might want to say give me the histories containing datasets tagged as Ensembl version 48, or UniProt 3 or some version of a reference assembly, etc. Or I might want to see how the results changed for a certain gene over time as result of updated databases and /or tools. So I might want to say to Galaxy show me all histories containing ENSGALG000012589 or NM_45689725. Hence, I'd love to be able to search histories. In addition to make it a bit easier to trace thing in browse mode it would be nice if the date a history was last modified would be visible. Currently I only have the age of the history in minutes, hours or days. That is convenient for recent items, but for things that are longer ago a date makes more sense to me....
- There is a fixed "Database/Build" popup that I can use to tag my
data sets, but this feels artificially limited. Is there any reason why the species and database version cannot be separate items? If there would be a popup first to select a species followed by a second popup to select the genome assembly version, the lists could be a lot smaller and hence easier to navigate. In addition there are cases where I do have a species, but don't have an assembly or where there are additional version numbers to keep track of. For example I have lots of Ensembl data. Ensembl does not have a single version number, but 3 version numbers. There is one for the database schema, one for the assembly and one for the annotation/ genebuild. The curent version for mouse is for example: 55 37 h, where 55 is the release and schema version number, 37 the assembly and "h" the version of the gene build. In addition I recently moved to a proteomics group and might want to capture DB version numbers for species without a reference assembly. for example I might know the species name and the fact I'm using UniProt 15.5... but currently I cannot easily capture that in a consistent way. (I know I might add this to the "info" for a dataset, but it's free text, with all kinds of possible spelling variants as a result...)
Cheers,
Pi
Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online
galaxy-dev mailing list galaxy-dev@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-dev
On Jul 15, 2009, at 8:43 AM, Pieter Neerincx wrote:
The following is just a bunch of thoughts after using Galaxy for a while and which might be interesting for future developments...
Thanks for the suggestions, lots of good ideas here.
- There are three nice icons at the top of all my dataset items in the
history panel on the right for view, edit and delete. So why is there no save icon at the same location instead of a link further down?
This is simply because there is limited space. We've made the three most frequently needed options available as icons, but there is not space to include all options (save is just one, for certain datatypes we also have browse links and other options). I've played with a few ideas on how to make these options easier to get to. The most likely candidate is adding a popup menu, but this needs further study since any change to the history has potential to confuse current users.
- When I edit a workflow there is a save button above the canvas and
there is another on in the panel on the right when I edit the properties of a specific workflow item. As far as I can tell these buttons are not completely redundant, but why do I need two save buttons?
The reason is that one save button just saves and validates the parameters for a single step, the other saves and validates parameters for the entire workflow. We want to make sure that each step is validated at the time it is being edited. Otherwise we would need to guide the user through a complex validation project at save time.
I'm not happy with the way his works right now either, but again have not found a better solution. We could autosave the forms whenever a user changes a field, but this has some tricky UI implications.
Ensembl version 48, or UniProt 3 or some version of a reference assembly, etc. Or I might want to see how the results changed for a certain gene over time as result of updated databases and /or tools. So I might want to say to Galaxy show me all histories containing ENSGALG000012589 or NM_45689725. Hence, I'd love to be able to search histories.
This is a great idea, but computationally complex. If we just want to search the metadata in the histories (title, info, etc) that is pretty straightforward. Searching the actual data would be very costly, since it would require some sort of full text indexing of every dataset in Galaxy. Of course, this could be done in the background rather than in realtime... worth considering.
What would make this work better is if we could have additional metadata automatically added to datasets we get from external datasources that would include the sort of information people would be interested in searching for.
In addition to make it a bit easier to trace thing in browse mode it would be nice if the date a history was last modified would be visible. Currently I only have the age of the history in minutes, hours or days. That is convenient for recent items, but for things that are longer ago a date makes more sense to me....
This is no problem, we can display a date if it is more than a few days old.
- There is a fixed "Database/Build" popup that I can use to tag my
data sets, but this feels artificially limited. Is there any reason why the species and database version cannot be separate items? If there would be a popup first to select a species followed by a second popup to select the genome assembly version, the lists could be a lot smaller and hence easier to navigate.
Absolutely. We're playing with a variety of ways to make database/ build easier to work with. The main problem is that since we draw builds from multiple sources, getting the appropriate data to group them in a reliable way is challenging.
In addition there are cases where I do have a species, but don't have an assembly or where there are additional version numbers to keep track of. For example I have lots of Ensembl data. Ensembl does not have a single version number, but 3 version numbers. There is one for the database schema, one for the assembly and one for the annotation/ genebuild. The curent version for mouse is for example: 55 37 h, where 55 is the release and schema version number, 37 the assembly and "h" the version of the gene build. In addition I recently moved to a proteomics group and might want to capture DB version numbers for species without a reference assembly. for example I might know the species name and the fact I'm using UniProt 15.5... but currently I cannot easily capture that in a consistent way. (I know I might add this to the "info" for a dataset, but it's free text, with all kinds of possible spelling variants as a result...)
Builds and versioning are one of the trickiest problems we have to deal with. I'd love to hear your suggestions on how you'd like the tracking of this information to work.
Thanks, James
Hi James,
On 15•Jul•2009, at 3:12 PM, James Taylor wrote:
On Jul 15, 2009, at 8:43 AM, Pieter Neerincx wrote:
The following is just a bunch of thoughts after using Galaxy for a while and which might be interesting for future developments...
Thanks for the suggestions, lots of good ideas here.
:)
- There are three nice icons at the top of all my dataset items in
the history panel on the right for view, edit and delete. So why is there no save icon at the same location instead of a link further down?
This is simply because there is limited space. We've made the three most frequently needed options available as icons, but there is not space to include all options (save is just one, for certain datatypes we also have browse links and other options). I've played with a few ideas on how to make these options easier to get to. The most likely candidate is adding a popup menu, but this needs further study since any change to the history has potential to confuse current users.
I guessed it was due to space constraints and apart from buying larger screens I don't have a quick solution... I like the current 3 icons as they give me quick access to functionality without expanding the history item :), so I hope these will not be hidden in a popup menu requiring extra clicking, but I like the idea of an extra icon for a popup menu that gives me access to the less frequently used options. Whatever you choose, anything that makes it more consistent should make it easier to use :)
- When I edit a workflow there is a save button above the canvas and
there is another on in the panel on the right when I edit the properties of a specific workflow item. As far as I can tell these buttons are not completely redundant, but why do I need two save buttons?
The reason is that one save button just saves and validates the parameters for a single step, the other saves and validates parameters for the entire workflow. We want to make sure that each step is validated at the time it is being edited. Otherwise we would need to guide the user through a complex validation project at save time.
I'm not happy with the way his works right now either, but again have not found a better solution. We could autosave the forms whenever a user changes a field, but this has some tricky UI implications.
I understand. How about naming the button that "saves" and validates params for a single step "update" or "apply". Instead of saving the workflow after each step, you could work with a copy of the workflow. When changes are applied to a single step it updates the copy. Only when you click save for the entire workflow the updated copy overwrites the original. This way a user can also explore several modifications and when the result is not satisfying cancel all mods by simply not saving the workflow.
Ensembl version 48, or UniProt 3 or some version of a reference assembly, etc. Or I might want to see how the results changed for a certain gene over time as result of updated databases and /or tools. So I might want to say to Galaxy show me all histories containing ENSGALG000012589 or NM_45689725. Hence, I'd love to be able to search histories.
This is a great idea, but computationally complex. If we just want to search the metadata in the histories (title, info, etc) that is pretty straightforward. Searching the actual data would be very costly, since it would require some sort of full text indexing of every dataset in Galaxy. Of course, this could be done in the background rather than in realtime... worth considering.
I understand that searching the actual data could potentially be very resource intensive, but I would already be very happy if I can search the metadata. Simply being able to search for the analysis that was based on a certain DB version would be really nice. Now I have to lookup release dates for DB versions and based on that examine histories a date near the DB release date.
What would make this work better is if we could have additional metadata automatically added to datasets we get from external datasources that would include the sort of information people would be interested in searching for.
That would be great!
In addition to make it a bit easier to trace thing in browse mode it would be nice if the date a history was last modified would be visible. Currently I only have the age of the history in minutes, hours or days. That is convenient for recent items, but for things that are longer ago a date makes more sense to me....
This is no problem, we can display a date if it is more than a few days old.
I'm a bit confused: Does that mean you could add this to a future Galaxy version or is it already possible with the current version and did I fail to find some configuration parameter?
- There is a fixed "Database/Build" popup that I can use to tag my
data sets, but this feels artificially limited. Is there any reason why the species and database version cannot be separate items? If there would be a popup first to select a species followed by a second popup to select the genome assembly version, the lists could be a lot smaller and hence easier to navigate.
Absolutely. We're playing with a variety of ways to make database/ build easier to work with. The main problem is that since we draw builds from multiple sources, getting the appropriate data to group them in a reliable way is challenging.
In addition there are cases where I do have a species, but don't have an assembly or where there are additional version numbers to keep track of. For example I have lots of Ensembl data. Ensembl does not have a single version number, but 3 version numbers. There is one for the database schema, one for the assembly and one for the annotation/ genebuild. The curent version for mouse is for example: 55 37 h, where 55 is the release and schema version number, 37 the assembly and "h" the version of the gene build. In addition I recently moved to a proteomics group and might want to capture DB version numbers for species without a reference assembly. for example I might know the species name and the fact I'm using UniProt 15.5... but currently I cannot easily capture that in a consistent way. (I know I might add this to the "info" for a dataset, but it's free text, with all kinds of possible spelling variants as a result...)
Builds and versioning are one of the trickiest problems we have to deal with. I'd love to hear your suggestions on how you'd like the tracking of this information to work.
I know this isn't easy... Preferably you would like to be able to do things like search for analyses that were done with DB versions older or newer than a certain version. If all data sources would simply use integers and increment with 1 for each release, it would be easy. A partial solution might be capturing release dates in a separate field as you can easily standardise the format for a date field and search it :). Would I would love to see is separate metadata field for species, database, version number and release date. Of these fields at least the species, database and release date fields could use a thesaurus / standardised formatting making them easily searchable...
Cheers,
Pi
Thanks, James
------------------------------------------------------------- Biomolecular Mass Spectrometry and Proteomics Utrecht University
Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands
Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands
phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online ------------------------------------------------------------
Hi Pieter, A couple months ago you requested tagging and searching of histories:
Ensembl version 48, or UniProt 3 or some version of a reference
assembly, etc. Or I might want to see how the results changed for a certain gene over time as result of updated databases and /or tools. So I might want to say to Galaxy show me all histories containing ENSGALG000012589 or NM_45689725. Hence, I'd love to be able to search histories.
FYI, these features have been introduced into Galaxy. It's now possible to tag histories and search histories via tags. These features are in the galaxy-central codebase and will be on the public Galaxy server shortly.
Best, J.
galaxy-dev@lists.galaxyproject.org