Which ID ('id', 'workflow_id', and 'dataset_id') should be used?
Hi all, I will appreciate your help in understanding the 'id' key returned from the API. I am using Galaxy Version 15.03 & bioblend Version 0.8.0. Example: I have highlighted the id and related fields with bold and red.
workflowClient.get_invocations('f7bb1edd6b95db62') [{u'inputs': {u'1': {u'src': u'hda', u'id': u'06d9fe130fbe098e'}}, u'update_time': u'2017-05-17T03:09:10', u'uuid': u'fd066a98-3aad-11e7-90e9-1cc1de6d5ef4', u'history_id': u'b8a0d6158b9961df', u'state': u'scheduled', *u'workflow_id': u'915ae9a80309f157'*, u'steps': ... u'model_class': u'WorkflowInvocation', *u'id': u'8c49be448cfe29bc'*}]
Why is the '*workflow_id*' different from the one I passed to the fucntion? And why is that '*workflow_id' *is not found anywhere in the return value?
historyClient.show_dataset(hid,'468b2dfe96a5a9a1') {u'accessible': True, u'resubmitted': False, u'create_time': u'2017-05-17T03:04:02', u'download_url': u'/api/histories/b8a0d6158b9961df/contents/468b2dfe96a5a9a1/display', u'file_size': 545, *u'dataset_id': u'56c890cbef28295c', u'id': u'468b2dfe96a5a9a1'*, u'misc_info': u'uploaded fastqsanger file', u'hda_ldda': u'hda', u'metadata_sequences': 5, u'state': u'ok', u'display_types': [], u'display_apps': [], u'type': u'file', u'file_path': None, u'misc_blurb': u'5 sequences', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><td>@1</td></tr><tr><td>tccacaagccattgtgtgtaattaaccactaattgtgtataagtttaaact</td></tr><tr><td>+</td></tr><tr><td>IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII</td></tr><tr><td>@2</td></tr><tr><td>tccacaagccattgtgtgtaattaaccactaattgtgtataagtttaaact</td></tr></table>', u'update_time': u'2017-05-17T03:04:06', u'data_type': u'galaxy.datatypes.sequence.FastqSanger', u'tags': [], u'deleted': False, u'history_id': u'b8a0d6158b9961df', u'meta_files': [], u'genome_build': u'?', u'hid': 1, u'model_class': u'HistoryDatasetAssociation', u'metadata_data_lines': 20, u'file_ext': u'fastqsanger', u'annotation': None, u'metadata_dbkey': u'?', u'history_content_type': u'dataset', *u'name': u'a_1.fastq'*, u'extension': u'fastqsanger', u'visible': True, u'url': u'/api/histories/b8a0d6158b9961df/contents/468b2dfe96a5a9a1', u'uuid': u'aa6dcf49-6fe9-49e0-8064-c8bc275a37d5', u'visualizations': [], u'purged': False, u'api_type': u'file'}
historyClient.show_dataset(hid,'56c890cbef28295c') {u'accessible': True, u'resubmitted': False, u'create_time': u'2017-05-17T02:59:27', u'file_size': 64, *u'dataset_id': u'9ccf9e6f1cf4d1fa', u'id': u'56c890cbef28295c'*, u'misc_info': u'##fileformat=VCFv4.1\n##FILTER=<ID=PASS,Description="All filters passed">\n##fileDate=20170517\n##source=freeBayes v0.9.20\n##reference=localref.fa\n##phasing=none\n##commandline="freebayes --bam localbam_0.bam --fasta-reference localref.fa --vcf /home/sphadmi', u'hda_ldda': u'hda', u'download_url': u'/api/histories/06ec17aefa2d49dd/contents/56c890cbef28295c/display', u'state': u'ok', u'display_types': [], u'display_apps': [], u'type': u'file', u'file_path': None, u'misc_blurb': u'0 lines', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><td>#Calculation and writing of high density regions has completed.</td></tr></table>', u'update_time': u'2017-05-17T02:59:36', u'data_type': u'galaxy.datatypes.data.Text', u'tags': [], u'deleted': False, u'history_id': u'06ec17aefa2d49dd', u'meta_files': [], u'genome_build': u'?', u'hid': 44, u'model_class': u'HistoryDatasetAssociation', u'metadata_data_lines': None, u'file_ext': u'txt', u'annotation': None, u'metadata_dbkey': u'?', u'history_content_type': u'dataset', *u'name': u'High density regions',* u'extension': u'txt', u'visible': False, u'url': u'/api/histories/06ec17aefa2d49dd/contents/56c890cbef28295c', u'uuid': u'8b8c70a4-cd2e-43d3-bc77-b06511557c96', u'visualizations': [], u'purged': False, u'api_type': u'file'}
Similarly, here the '*dataset_id' *is different from the one I passed to *show_dataset* method. If I check the '*dataset_id*' from first call, it points to another different file! Please let me know which of these 'id' should be used and what would be the purpose of the other id? Thanks for your help and time! Best, Aarthi
Hi Aarthi, thanks for your email, see my replies inline. On 17/05/17 08:21, Aarthi Mohan wrote:
Hi all,
I will appreciate your help in understanding the 'id' key returned from the API. I am using Galaxy Version 15.03 & bioblend Version 0.8.0.
Example:
I have highlighted the id and related fields with bold and red.
>>> workflowClient.get_invocations('f7bb1edd6b95db62') [{u'inputs': {u'1': {u'src': u'hda', u'id': u'06d9fe130fbe098e'}}, u'update_time': u'2017-05-17T03:09:10', u'uuid': u'fd066a98-3aad-11e7-90e9-1cc1de6d5ef4', u'history_id': u'b8a0d6158b9961df', u'state': u'scheduled', *u'workflow_id': u'915ae9a80309f157'*, u'steps': ... u'model_class': u'WorkflowInvocation', *u'id': u'8c49be448cfe29bc'*}]
Why is the '/workflow_id/' different from the one I passed to the fucntion? And why is that '/workflow_id' /is not found anywhere in the return value?
The confusion here is generated by the API mixing 2 concepts used by Galaxy to manage workflows: "stored workflows" and "workflows". A stored workflow represents a workflow throughout its life (storing name, description, owner, if it's deleted/published...), while a workflow is particular version of a stored workflow, with the description of the various input, steps, subworkflows. Every time you modify and save a stored workflow in the UI, a new workflow is generated and associated to the stored workflow. The stored workflow is always linked to the latest workflow version. The ids used to interact with the API are the stored workflow ids ('f7bb1edd6b95db62' in your example above), while get_invocations() returns the workflow id ('915ae9a80309f157' in your case). That's because an invocation derives from a particular version of the workflow. It may be good to extend the API to also return the stored workflow id.
>>> historyClient.show_dataset(hid,'468b2dfe96a5a9a1') {u'accessible': True, u'resubmitted': False, u'create_time': u'2017-05-17T03:04:02', u'download_url': u'/api/histories/b8a0d6158b9961df/contents/468b2dfe96a5a9a1/display', u'file_size': 545, *u'dataset_id': u'56c890cbef28295c', u'id': u'468b2dfe96a5a9a1'*, u'misc_info': u'uploaded fastqsanger file', u'hda_ldda': u'hda', u'metadata_sequences': 5, u'state': u'ok', u'display_types': [], u'display_apps': [], u'type': u'file', u'file_path': None, u'misc_blurb': u'5 sequences', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><td>@1</td></tr><tr><td>tccacaagccattgtgtgtaattaaccactaattgtgtataagtttaaact</td></tr><tr><td>+</td></tr><tr><td>IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII</td></tr><tr><td>@2</td></tr><tr><td>tccacaagccattgtgtgtaattaaccactaattgtgtataagtttaaact</td></tr></table>', u'update_time': u'2017-05-17T03:04:06', u'data_type': u'galaxy.datatypes.sequence.FastqSanger', u'tags': [], u'deleted': False, u'history_id': u'b8a0d6158b9961df', u'meta_files': [], u'genome_build': u'?', u'hid': 1, u'model_class': u'HistoryDatasetAssociation', u'metadata_data_lines': 20, u'file_ext': u'fastqsanger', u'annotation': None, u'metadata_dbkey': u'?', u'history_content_type': u'dataset', *u'name': u'a_1.fastq'*, u'extension': u'fastqsanger', u'visible': True, u'url': u'/api/histories/b8a0d6158b9961df/contents/468b2dfe96a5a9a1', u'uuid': u'aa6dcf49-6fe9-49e0-8064-c8bc275a37d5', u'visualizations': [], u'purged': False, u'api_type': u'file'}
>>> historyClient.show_dataset(hid,'56c890cbef28295c') {u'accessible': True, u'resubmitted': False, u'create_time': u'2017-05-17T02:59:27', u'file_size': 64, *u'dataset_id': u'9ccf9e6f1cf4d1fa', u'id': u'56c890cbef28295c'*, u'misc_info': u'##fileformat=VCFv4.1\n##FILTER=<ID=PASS,Description="All filters passed">\n##fileDate=20170517\n##source=freeBayes v0.9.20\n##reference=localref.fa\n##phasing=none\n##commandline="freebayes --bam localbam_0.bam --fasta-reference localref.fa --vcf /home/sphadmi', u'hda_ldda': u'hda', u'download_url': u'/api/histories/06ec17aefa2d49dd/contents/56c890cbef28295c/display', u'state': u'ok', u'display_types': [], u'display_apps': [], u'type': u'file', u'file_path': None, u'misc_blurb': u'0 lines', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><td>#Calculation and writing of high density regions has completed.</td></tr></table>', u'update_time': u'2017-05-17T02:59:36', u'data_type': u'galaxy.datatypes.data.Text', u'tags': [], u'deleted': False, u'history_id': u'06ec17aefa2d49dd', u'meta_files': [], u'genome_build': u'?', u'hid': 44, u'model_class': u'HistoryDatasetAssociation', u'metadata_data_lines': None, u'file_ext': u'txt', u'annotation': None, u'metadata_dbkey': u'?', u'history_content_type': u'dataset', *u'name': u'High density regions',* u'extension': u'txt', u'visible': False, u'url': u'/api/histories/06ec17aefa2d49dd/contents/56c890cbef28295c', u'uuid': u'8b8c70a4-cd2e-43d3-bc77-b06511557c96', u'visualizations': [], u'purged': False, u'api_type': u'file'}
Similarly, here the '/dataset_id' /is different from the one I passed to _show_dataset_ method. If I check the '/dataset_id/' from first call, it points to another different file!
There's nothing wrong here, the API returns the id of the history dataset you requested in the 'id' field. The 'dataset_id' does not refer to a "history dataset", but to the more general "dataset". A history dataset is a particular instance of a dataset in one history, but the same dataset can be used in other histories or libraries and can be shared with other users. So you may have multiple history datasets and library datasets all pointing to the same file on disk. Cheers, Nicola
Please let me know which of these 'id' should be used and what would be the purpose of the other id?
Thanks for your help and time!
Best, Aarthi
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/
Thanks for the detailed explanation Nicola! Best, Aarthi On Wed, May 17, 2017 at 6:29 PM, Nicola Soranzo <nsoranzo@tiscali.it> wrote:
Hi Aarthi, thanks for your email, see my replies inline.
On 17/05/17 08:21, Aarthi Mohan wrote:
Hi all,
I will appreciate your help in understanding the 'id' key returned from the API. I am using Galaxy Version 15.03 & bioblend Version 0.8.0.
Example:
I have highlighted the id and related fields with bold and red.
workflowClient.get_invocations('f7bb1edd6b95db62') [{u'inputs': {u'1': {u'src': u'hda', u'id': u'06d9fe130fbe098e'}}, u'update_time': u'2017-05-17T03:09:10', u'uuid': u'fd066a98-3aad-11e7-90e9-1cc1de6d5ef4', u'history_id': u'b8a0d6158b9961df', u'state': u'scheduled', *u'workflow_id': u'915ae9a80309f157'*, u'steps': ... u'model_class': u'WorkflowInvocation', *u'id': u'8c49be448cfe29bc'*}]
Why is the '*workflow_id*' different from the one I passed to the fucntion? And why is that '*workflow_id' *is not found anywhere in the return value?
The confusion here is generated by the API mixing 2 concepts used by Galaxy to manage workflows: "stored workflows" and "workflows". A stored workflow represents a workflow throughout its life (storing name, description, owner, if it's deleted/published...), while a workflow is particular version of a stored workflow, with the description of the various input, steps, subworkflows. Every time you modify and save a stored workflow in the UI, a new workflow is generated and associated to the stored workflow. The stored workflow is always linked to the latest workflow version.
The ids used to interact with the API are the stored workflow ids ('f7bb1edd6b95db62' in your example above), while get_invocations() returns the workflow id ('915ae9a80309f157' in your case). That's because an invocation derives from a particular version of the workflow. It may be good to extend the API to also return the stored workflow id.
historyClient.show_dataset(hid,'468b2dfe96a5a9a1') {u'accessible': True, u'resubmitted': False, u'create_time': u'2017-05-17T03:04:02', u'download_url': u'/api/histories/ b8a0d6158b9961df/contents/468b2dfe96a5a9a1/display', u'file_size': 545, *u'dataset_id': u'56c890cbef28295c', u'id': u'468b2dfe96a5a9a1'*, u'misc_info': u'uploaded fastqsanger file', u'hda_ldda': u'hda', u'metadata_sequences': 5, u'state': u'ok', u'display_types': [], u'display_apps': [], u'type': u'file', u'file_path': None, u'misc_blurb': u'5 sequences', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><td>@1</td></tr><tr><td> tccacaagccattgtgtgtaattaaccactaattgtgtataagtttaaact</td></ tr><tr><td>+</td></tr><tr><td>IIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIII</td></tr><tr><td>@2</td></tr><tr><td> tccacaagccattgtgtgtaattaaccactaattgtgtataagtttaaact</td></tr></table>', u'update_time': u'2017-05-17T03:04:06', u'data_type': u'galaxy.datatypes.sequence.FastqSanger', u'tags': [], u'deleted': False, u'history_id': u'b8a0d6158b9961df', u'meta_files': [], u'genome_build': u'?', u'hid': 1, u'model_class': u'HistoryDatasetAssociation', u'metadata_data_lines': 20, u'file_ext': u'fastqsanger', u'annotation': None, u'metadata_dbkey': u'?', u'history_content_type': u'dataset', *u'name': u'a_1.fastq'*, u'extension': u'fastqsanger', u'visible': True, u'url': u'/api/histories/ b8a0d6158b9961df/contents/468b2dfe96a5a9a1', u'uuid': u'aa6dcf49-6fe9-49e0-8064-c8bc275a37d5', u'visualizations': [], u'purged': False, u'api_type': u'file'}
historyClient.show_dataset(hid,'56c890cbef28295c') {u'accessible': True, u'resubmitted': False, u'create_time': u'2017-05-17T02:59:27', u'file_size': 64, *u'dataset_id': u'9ccf9e6f1cf4d1fa', u'id': u'56c890cbef28295c'*, u'misc_info': u'##fileformat=VCFv4.1\n##FILTER=<ID=PASS,Description="All filters passed">\n##fileDate=20170517\n##source=freeBayes v0.9.20\n##reference=localref.fa\n##phasing=none\n##commandline="freebayes --bam localbam_0.bam --fasta-reference localref.fa --vcf /home/sphadmi', u'hda_ldda': u'hda', u'download_url': u'/api/histories/ 06ec17aefa2d49dd/contents/56c890cbef28295c/display', u'state': u'ok', u'display_types': [], u'display_apps': [], u'type': u'file', u'file_path': None, u'misc_blurb': u'0 lines', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><td>#Calculation and writing of high density regions has completed.</td></tr></table>', u'update_time': u'2017-05-17T02:59:36', u'data_type': u'galaxy.datatypes.data.Text', u'tags': [], u'deleted': False, u'history_id': u'06ec17aefa2d49dd', u'meta_files': [], u'genome_build': u'?', u'hid': 44, u'model_class': u'HistoryDatasetAssociation', u'metadata_data_lines': None, u'file_ext': u'txt', u'annotation': None, u'metadata_dbkey': u'?', u'history_content_type': u'dataset', *u'name': u'High density regions',* u'extension': u'txt', u'visible': False, u'url': u'/api/histories/ 06ec17aefa2d49dd/contents/56c890cbef28295c', u'uuid': u'8b8c70a4-cd2e-43d3-bc77-b06511557c96', u'visualizations': [], u'purged': False, u'api_type': u'file'}
Similarly, here the '*dataset_id' *is different from the one I passed to *show_dataset* method. If I check the '*dataset_id*' from first call, it points to another different file!
There's nothing wrong here, the API returns the id of the history dataset you requested in the 'id' field. The 'dataset_id' does not refer to a "history dataset", but to the more general "dataset". A history dataset is a particular instance of a dataset in one history, but the same dataset can be used in other histories or libraries and can be shared with other users. So you may have multiple history datasets and library datasets all pointing to the same file on disk.
Cheers, Nicola
Please let me know which of these 'id' should be used and what would be the purpose of the other id?
Thanks for your help and time!
Best, Aarthi
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/
participants (2)
-
Aarthi Mohan
-
Nicola Soranzo