I'm having a tough time keeping track of which data is which after analysis... I can do a bunch of work customizing each tool and each workflow, renaming results etc. But I think it might be a lot easier to manage if step names were based on the titles of the history items instead of "data 2" or whatever. Has this been tried and rejected for some reason? Would a pull request implementing this change be welcomed? Am I just "doing it wrong"? Any suggestions are welcome. Brad -- Brad Langhorst New England Biolabs langhorst@neb.com
Brad,
But I think it might be a lot easier to manage if step names were based on the titles of the history items instead of "data 2" or whatever.
Has this been tried and rejected for some reason?
It's been tried and rejected because dataset names get very long and unwieldy. E.g. "Sam/Bam Alignment Summary Metrics on Tophat on Filter FASTQ on my_rna_seq_reads"
Would a pull request implementing this change be welcomed?
What we imagine would help is a way to easily show/find a dataset's analysis path -- its parents and its decendants -- so that it's possible to trace the datasets/tools used to create a dataset and the tools/datasets subsequently used. This is something we'd like to do but haven't put much effort into yet. Community contributions in this space would be great. Best, J.
Hi Jeremy and Ross I agree that the current chaining mechanism would get very long after even a few steps. Ususally the most important information is the first and last step E.g. The TopHat run should be called TopHat on SOLiD 24A The alignment stats should be SAM/BAM Summary Metrics of Solid 24A With the rest of the tools in the chain identified in the "more information" box. This would also give graph generating tools a fighting chance to present something useful in any graphs generated. E.g. GC Bias Plot of Solid 24A could have a title of Solid 24A instead of dataset_234.dat What do you think of this first-last model? Brad -- Brad Langhorst New England Biolabs langhorst@neb.com From: Jeremy Goecks <jeremy.goecks@emory.edu<mailto:jeremy.goecks@emory.edu>> Date: Tue, 31 Jan 2012 09:00:38 -0500 To: Brad Langhorst <langhorst@neb.com<mailto:langhorst@neb.com>> Cc: "galaxy-dev@lists.bx.psu.edu<mailto:galaxy-dev@lists.bx.psu.edu>" <galaxy-dev@lists.bx.psu.edu<mailto:galaxy-dev@lists.bx.psu.edu>> Subject: Re: [galaxy-dev] naming of history steps Brad, But I think it might be a lot easier to manage if step names were based on the titles of the history items instead of "data 2" or whatever. Has this been tried and rejected for some reason? It's been tried and rejected because dataset names get very long and unwieldy. E.g. "Sam/Bam Alignment Summary Metrics on Tophat on Filter FASTQ on my_rna_seq_reads" Would a pull request implementing this change be welcomed? What we imagine would help is a way to easily show/find a dataset's analysis path -- its parents and its decendants -- so that it's possible to trace the datasets/tools used to create a dataset and the tools/datasets subsequently used. This is something we'd like to do but haven't put much effort into yet. Community contributions in this space would be great. Best, J.
Ususally the most important information is the first and last step E.g. The TopHat run should be called TopHat on SOLiD 24A
The alignment stats should be SAM/BAM Summary Metrics of Solid 24A With the rest of the tools in the chain identified in the "more information" box.
This would also give graph generating tools a fighting chance to present something useful in any graphs generated. E.g. GC Bias Plot of Solid 24A could have a title of Solid 24A instead of dataset_234.dat
What do you think of this first-last model?
This model breaks down during experimentation. E.g. let's say three different methods for trimming a FastQ dataset are tried before mapping with Bowtie. Currently, the Bowtie runs are named differently b/c each trimmed dataset is a unique input. Using first-last model, all datasets are named the same and it is not possible to differentiate b/t them without looking at the inputs, which requires clicking on the rerun/info button and finding the input(s). The current approach used by Galaxy lists the inputs in the dataset title to avoid these issues. Datasets with the same name becomes more problematic as more steps are added b/t first and last because, while they have the same name, the steps taken to produce them may be very different. The first-last model could be nice for workflows, though, perhaps as an extension of the "rename dataset" actions or a kind of global "rename dataset" action. J.
Hi all, Is there a way, or better yet a coded example, of how to list the options for an input parameter using the contents of a shared data library? Up until now I've use Python scripts to get the input file options from a directory listing for a hardcoded path on the server, but it'd be much nicer to draw the list from a Shared Data folder, for maintenance and metadata purposes. Cheers, Paul
participants (3)
-
Jeremy Goecks
-
Langhorst, Brad
-
Paul Gordon