[galaxyproject/galaxy] 3fa608: Use pysam.merge instead of samtools merge
Branch: refs/heads/dev Home: https://github.com/galaxyproject/galaxy Commit: 3fa608f54850d9a93e910cdc55fa1f87c4c87c7c https://github.com/galaxyproject/galaxy/commit/3fa608f54850d9a93e910cdc55fa1... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/binary.py Log Message: ----------- Use pysam.merge instead of samtools merge Commit: 16d0d3ecafb47ca8f112e8fa339f492446f9e80c https://github.com/galaxyproject/galaxy/commit/16d0d3ecafb47ca8f112e8fa339f4... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: A lib/galaxy/datatypes/test/1.unsorted.bam Log Message: ----------- Add unsorted test bam Commit: 3c9999d89acfdd7aa1f56969ca7e03da3d83e655 https://github.com/galaxyproject/galaxy/commit/3c9999d89acfdd7aa1f56969ca7e0... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/binary.py M lib/galaxy/datatypes/test/1.bam M lib/galaxy/datatypes/test/1.unsorted.bam Log Message: ----------- Use pysam for metadata setting and grooming There are also some noteworthy changes here: - We do always respect the sort-order specified in the header - If the sort order is not mentioned in the header or no header exists we coordinate-sort the file. - We do not use indexing to determine if a file is coordinate sorted, because this does not work reliably with samtools/pysam > 1.X, since arbitrarily sorted files can be indexed now. This also fixes advanced metadata setting (sort_order, bam_version and more), which appears to have been broken. This probably went by unnoticed because of the catch-all try-except-pass. The downside to fixing this is that I had to (temporarilly, hopefully) comment out the reference_names, reference_lengths, bam_header and readgroups attributes, because they led to the following error: ``` galaxy.model.metadata DEBUG 2017-11-19 14:47:30,540 loading metadata from file for: HistoryDatasetAssociation 582 galaxy.jobs.runners.local ERROR 2017-11-19 14:47:30,636 Job wrapper finish method failed Traceback (most recent call last): File "/Users/mvandenb/src/galaxy/lib/galaxy/jobs/runners/local.py", line 130, in queue_job job_wrapper.finish(stdout, stderr, exit_code) File "/Users/mvandenb/src/galaxy/lib/galaxy/jobs/__init__.py", line 1357, in finish self.sa_session.flush() File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 157, in do return getattr(self.registry(), name)(*args, **kwargs) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2019, in flush self._flush(objects) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2137, in _flush transaction.rollback(_capture_exception=True) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__ compat.reraise(exc_type, exc_value, exc_tb) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2101, in _flush flush_context.execute() File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 373, in execute rec.execute(self) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 532, in execute uow File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 170, in save_obj mapper, table, update) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 706, in _emit_update_statements execute(statement, multiparams) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute return meth(self, multiparams, params) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context context) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception exc_info File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 202, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context context) File "/Users/mvandenb/src/galaxy/.venv/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute cursor.execute(statement, parameters) OperationalError: (psycopg2.OperationalError) index row size 7208 exceeds maximum 2712 for index "ix_history_dataset_association_metadata" HINT: Values larger than 1/3 of a buffer page cannot be indexed. Consider a function index of an MD5 hash of the value, or use full text indexing. [SQL: 'UPDATE history_dataset_association SET update_time=%(update_time)s, blurb=%(blurb)s, peek=%(peek)s, metadata=%(_metadata)s WHERE history_dataset_association.id = %(history_dataset_association_id)s'] [parameters: {'_metadata': <psycopg2.extensions.Binary object at 0x119ed8990>, 'update_time': datetime.datetime(2017, 11, 19, 13, 47, 30, 598058), 'history_dataset_association_id': 582, 'blurb': '3.5 KB', 'peek': 'Binary bam alignments file'}] ``` Commit: 369485a43ad50a877dc68324c1a2b6ab876030ea https://github.com/galaxyproject/galaxy/commit/369485a43ad50a877dc68324c1a2b... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/binary.py A test/unit/datatypes/test_bam.py Log Message: ----------- Move Bam doctests to unittests Commit: ae72e5668364f6f72861c21a62a641f9f65636e1 https://github.com/galaxyproject/galaxy/commit/ae72e5668364f6f72861c21a62a64... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/binary.py M test/unit/datatypes/test_bam.py Log Message: ----------- Use subprocess to check if pysam.index succeeds Checking if pysam.index succeeds tests whether a file is coordinate sorted. If pysam.index fails to index htslib writes to stderr and this fails set meta tool. To prevent this we run this in a subprocess and discard stderr. Commit: 836aea072e84411fb4a24e37fb306b5ec72044c6 https://github.com/galaxyproject/galaxy/commit/836aea072e84411fb4a24e37fb306... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: A lib/galaxy/datatypes/test/2.shuffled.bam Log Message: ----------- Add test file with random order Commit: bac56d6b27068d3744e84eb9756bc28bf392f718 https://github.com/galaxyproject/galaxy/commit/bac56d6b27068d3744e84eb9756bc... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/set_metadata_tool.xml M tools/data_source/upload.xml Log Message: ----------- Drop samtools from metadata and upload tools We only need samtools for the dataproviders, which shouldn't be used by these tools. Commit: df35d089e3ea8c15330afaeedfa24e9c2f0fa7b2 https://github.com/galaxyproject/galaxy/commit/df35d089e3ea8c15330afaeedfa24... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: A lib/galaxy/datatypes/test/1.vcf A lib/galaxy/datatypes/test/1.vcf.gz A lib/galaxy/datatypes/test/2.cram M test/unit/datatypes/test_bam.py A test/unit/datatypes/test_bcf.py A test/unit/datatypes/test_cram.py A test/unit/datatypes/test_vcf.py A test/unit/datatypes/util.py Log Message: ----------- Add more unittests for CRAM, bcf and vcf Commit: 3f6a59e85ff8acf2fb0b39670f450a523ea11f38 https://github.com/galaxyproject/galaxy/commit/3f6a59e85ff8acf2fb0b39670f450... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/binary.py M lib/galaxy/datatypes/tabular.py Log Message: ----------- Replace bcftools index with pysam.bcftools.index or pysam.tabix_index Commit: 8e580ec7ac599993ae082732b5c80f2b9bcce493 https://github.com/galaxyproject/galaxy/commit/8e580ec7ac599993ae082732b5c80... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/tabular.py Log Message: ----------- Metadata fixes for VcfGz Commit: 0ac77ad7fd5c6d6660521d69f6010cb61d4947ac https://github.com/galaxyproject/galaxy/commit/0ac77ad7fd5c6d6660521d69f6010... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/tabular.py M test/unit/datatypes/test_vcf.py M test/unit/datatypes/util.py Log Message: ----------- Fix Vcf index generation and test Commit: 6ca896ff9b199ac097f408e35e37a3365ff2ec46 https://github.com/galaxyproject/galaxy/commit/6ca896ff9b199ac097f408e35e37a... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/tabular.py M test/unit/datatypes/test_bcf.py M test/unit/datatypes/test_cram.py M test/unit/datatypes/test_vcf.py M test/unit/datatypes/util.py Log Message: ----------- Fix index file syntax and fix linting problems Commit: 9c4a2f70ea6a62f35b3c736730fcc71bddf6dce9 https://github.com/galaxyproject/galaxy/commit/9c4a2f70ea6a62f35b3c736730fcc... Author: Dannon Baker <dannon.baker@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/converters/interval_to_tabix_converter.py Log Message: ----------- Tabix indexing fixes. Upstream used 'index' instead of 'index_filename', and 'force' is now required since we precreate the destination location. Commit: 2a6e00dd001ea4f27cd7d9a12cddfcf9649da7ef https://github.com/galaxyproject/galaxy/commit/2a6e00dd001ea4f27cd7d9a12cddf... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M test/unit/datatypes/test_cram.py Log Message: ----------- one more linting fix Commit: 07158e03308e026ee013bd6a50708167918c2d27 https://github.com/galaxyproject/galaxy/commit/07158e03308e026ee013bd6a50708... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/tabular.py Log Message: ----------- Need force=True in pysam.tabix_index because the index file path exists in the object store Commit: 64deb4f935ac7969a0e81024297d84d9831a8cfd https://github.com/galaxyproject/galaxy/commit/64deb4f935ac7969a0e81024297d8... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/converters/cram_to_bam.py Log Message: ----------- We can use multiple threads when converting cram to bam files Commit: 90a56aea3f19df6943dfc648474d161e5c8f6d66 https://github.com/galaxyproject/galaxy/commit/90a56aea3f19df6943dfc648474d1... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/binary.py Log Message: ----------- Use more threads when sorting BAM files and GALAXY_SLOTS is set Commit: eb636c11361ef8f6cf85c8750f0a37608dcbee1a https://github.com/galaxyproject/galaxy/commit/eb636c11361ef8f6cf85c8750f0a3... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: A test/unit/datatypes/converters/__init__.py A test/unit/datatypes/converters/test_interval_to_tabix.py M test/unit/datatypes/test_bam.py M test/unit/datatypes/util.py Log Message: ----------- Make sure files don't change in-place Commit: 1195183ea63c3916e1b8e163b2cc0839bd18d8c4 https://github.com/galaxyproject/galaxy/commit/1195183ea63c3916e1b8e163b2cc0... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/visualization/data_providers/genome.py Log Message: ----------- Symlink tbi index to work around pysam limitation Before https://github.com/pysam-developers/pysam/pull/586 is merged and a new release is out we create a symlink to the tbi file, which is required for creating TabixFile instances. Since we want to cleanup the symlinks I turned `get_data_file` into a contextmanager. Along the way I also changed many open()/close() calls to `with` statements. Commit: 4795e490636e9a9234cb2d61c0a14508472a297d https://github.com/galaxyproject/galaxy/commit/4795e490636e9a9234cb2d61c0a14... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/visualization/data_providers/genome.py Log Message: ----------- Detect if index symlink hack is necessary Commit: 46371d1251ed80b740956dc7111708aa93041092 https://github.com/galaxyproject/galaxy/commit/46371d1251ed80b740956dc711170... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-08 (Fri, 08 Dec 2017) Changed paths: M lib/galaxy/datatypes/converters/interval_to_tabix_converter.py M test/unit/datatypes/converters/test_interval_to_tabix.py M test/unit/datatypes/util.py Log Message: ----------- Decompose interval_to_tabix converter script This renames some variables to make it clearer what files they reflect. Also adds a very basic test that this works as intended. Commit: 309b71720e3b5615fac6abd028aa66735fa94eef https://github.com/galaxyproject/galaxy/commit/309b71720e3b5615fac6abd028aa6... Author: mvdbeek <m.vandenbeek@gmail.com> Date: 2017-12-10 (Sun, 10 Dec 2017) Changed paths: M lib/galaxy/datatypes/binary.py M lib/galaxy/datatypes/tabular.py R lib/galaxy/datatypes/test/1.unsorted.bam M lib/galaxy/visualization/data_providers/genome.py M test/unit/datatypes/util.py Log Message: ----------- Improvements to datatypes suggested by @nsoranzo Commit: e198dd75782ce5b2a9619a5283e77a026393e149 https://github.com/galaxyproject/galaxy/commit/e198dd75782ce5b2a9619a5283e77... Author: Nicola Soranzo <nsoranzo@tiscali.it> Date: 2017-12-10 (Sun, 10 Dec 2017) Changed paths: M lib/galaxy/datatypes/binary.py M lib/galaxy/datatypes/converters/cram_to_bam.py M lib/galaxy/datatypes/converters/interval_to_tabix_converter.py M lib/galaxy/datatypes/set_metadata_tool.xml M lib/galaxy/datatypes/tabular.py M lib/galaxy/datatypes/test/1.bam A lib/galaxy/datatypes/test/1.vcf A lib/galaxy/datatypes/test/1.vcf.gz A lib/galaxy/datatypes/test/2.cram A lib/galaxy/datatypes/test/2.shuffled.bam M lib/galaxy/visualization/data_providers/genome.py A test/unit/datatypes/converters/__init__.py A test/unit/datatypes/converters/test_interval_to_tabix.py A test/unit/datatypes/test_bam.py A test/unit/datatypes/test_bcf.py A test/unit/datatypes/test_cram.py A test/unit/datatypes/test_vcf.py A test/unit/datatypes/util.py M tools/data_source/upload.xml Log Message: ----------- Merge pull request #5037 from mvdbeek/samtools_to_pysam Samtools to pysam Compare: https://github.com/galaxyproject/galaxy/compare/5ad4f57a63b0...e198dd75782c
participants (1)
-
GitHub