Skip to content

Releases: octue/octue-sdk-python

Release/0.1.17

07 May 18:05
8924d88
Compare
Choose a tag to compare

Contents

New Features

  • Allow Datafile to be used as a context manager for changes to local datafiles
  • Allow Datafile.from_cloud to be used as a context manager for changes to cloud datafiles
  • Allow Datafile to remember where in the cloud it came from
  • Add the following methods to Datafile:
    • get_cloud_metadata
    • update_cloud_metadata
    • clear_from_file_cache
    • _get_cloud_location
    • _store_cloud_location
    • _check_for_attribute_conflict
  • Avoid re-uploading Datafile file or metadata if they haven't changed
  • Raise error if implicit cloud location is missing from Datafile
  • Add GoogleCloudStorageClient.update_metadata method
  • Allow option to not update cloud metadata in Datafile cloud methods
  • Allow tags to contain capitals and forward slashes (but not start or end in a forward slash)
  • Allow datetime and posix timestamps for Datafile.timestamp
  • Add Datafile.posix_timestamp property

Breaking changes

  • Close #148: remove hash_value from Datafile GCS metadata
  • When hashing Datafiles, only hash represented file (i.e. stop hashing metadata)
  • When hashing Datasets and Manifests, only hash the files contained (i.e. stop hashing metadata)
  • Make hash of Hashable instance with _ATTRIBUTES_TO_HASH=None the empty string hash value "AAAAAA=="

Minor improvements

  • Simplify output of GoogleCloudStorageClient.get_metadata
  • Make Hashable instances re-calculate their hash_value every time unless an immutable_hash_value is explicitly provided (e.g. for cloud datafiles where you don't have the file locally to hash)
  • Add private Identifiable._set_id method
  • Close #147: pull metadata gathering for Datafile into method
  • Get datetime objects directly from GCS blob instead of parsing string serialisations
  • Add time utils module
  • Add hash preparation function to Hashable for datetime instances
  • Use the empty string hash value for Datafile if GCS crc32c metadata isn't present
  • Stop serialising hash value of Manifest, Dataset, and Datafile

Fixes

  • Close #146: Stop serialising GCS metadata as JSON. This avoids strings in the metadata appearing in two sets of quotation marks on Google Cloud Storage. This is a breaking change for any files already persisted with JSON-encoded metadata.
  • Remove ability to set custom hash value via kwargs when using Datafile.from_cloud

Testing

  • Factor out cloud datafile creation in datafile tests

Quality Checklist

  • New features are fully tested (No matter how much Coverage Karma you have)

Release/0.1.16

03 May 10:31
aa1f9cc
Compare
Choose a tag to compare

Contents

Breaking changes

  • Rename Service.__init__ parameter id to service_id to avoid built-in name clash
  • Move deployment package into cloud package

Dependencies

  • Use newest version of twined to support python>=3.6

Minor improvements

  • Remove duplicate code and unnecessary comments from Runner
  • Raise error if SERVICE_ID envvar is missing from deployment environment
  • Disallow non-None empty values as Service IDs
  • Add base class for service backends; update docstrings

Fixes

  • Use OctueJSONEncoder in JSON serialisation inside Service.answer to ensure numpy arrays are serialised

Testing

  • Add tests for Topic and Subscription
  • Add extra test for Service
  • Shorten runtime of cli.start test

Release/0.1.15

26 Apr 14:04
258f568
Compare
Choose a tag to compare

Contents

Fixes

  • Add from_string option to Serialisable.deserialise

Testing

  • Mock Google Pub/Sub Service, Topic, Subscription, Publisher and Subscriber in tests
  • Remove unneeded cleanup code from Service tests

Release/0.1.14

23 Apr 16:56
61fa92f
Compare
Choose a tag to compare

Contents

Breaking changes

  • Remove TagSet.__str__

Fixes

  • Use TagSet to deserialise tags in Datafile.from_cloud
  • Add custom (de)serialise methods to TagSet
  • Return subtags of a Tag in order using a FilterList
  • Remove separate dependencies copy/cache steps in Google Cloud Run Dockerfile so that it works for older versions of docker

Minor improvements

  • Remove absolute path from Dataset and Manifest serialisation
  • Add Serialisable.deserialise method
  • Add filter method to TagSet to avoid e.g. taggable.tags.tags.filter

Operations

  • Improve description of release workflow

Release/0.1.13

21 Apr 12:35
eb0817b
Compare
Choose a tag to compare

Contents

New features

  • Support setup.py and requirements-dev.txt in Cloud Run Dockerfile
  • Retrieve credentials from Google Cloud Secret Manager and inject into environment in Runner.run
  • Add ability to retrieve and update cloud files via the Datafile.download or Datafile.open methods
  • Allow cloud file attributes to be updated via Datafile.to_cloud method
  • Allow instantiation of TagSets from JSON-encoded lists

Breaking changes

  • Raise error if the datasets of the input manifest passed to Service.ask aren't all cloud-based

Fixes

  • Fix Dataset construction from serialised form in Manifest
  • Fix Datafile construction from serialised form in Dataset
  • Fix Datafile.deserialise
  • Adjust usages of tempfile.NamedTemporaryFile to also work on Windows
  • Add timeout and retry to Service.answer
  • Add retry to Service.wait_for_answer
  • Add 60 second timeout for answering question in Cloud Run deployment
  • Use correct environment variable for service ID in Cloud Run Dockerfile
  • Set _last_modified, size_bytes, and _hash_value to null values if a Datafile representing a cloud file is instantiated for a hypothetical cloud location (i.e. not synced to a cloud file at that point in time)
  • Allow Dataset.get_file_sequence use with no filter

Dependencies

  • Use new twined version that supports validation of credentials strand
  • Use newest version of gcp-storage-emulator

Minor improvements

  • Make path a positional argument of Datafile
  • Move gunicorn requirement into octue requirements
  • Raise warning instead of error if Google Cloud credentials environment variable is not found and return None as credentials
  • Move cloud code into new cloud subpackage
  • Raise TimeoutError in Service.wait_for_answer if no response is received by end of retries
  • Only look for deployment_configuration.json file in docker container /app directory
  • Ensure deployment_configuration.json file is always loaded correctly in docker container
  • Pass credentials strand into Runner instance in Cloud Run deployment
  • Add name attribute to Identifiable mixin
  • Add Google Cloud metadata to Datafile serialisation
  • Add deserialise method to Datafile
  • Add ability to add metadata to a Datafile instantiated from a regular cloud file
  • Use CRC32C hash value from Google Cloud when instantiating a Datafile from the cloud
  • Add ability to name Datafiles
  • Add ability to check whether a Datafile, all Datafiles in a Dataset, or all Datasets in a Manifest are located in Google Cloud
  • Use Datafile.deserialise when instantiating a Dataset from a dictionary
  • Add representation to GCPPubSubBackend
  • Load credentials strand JSON in Runner initialisation
  • Add location searched to message of error raised when app module can't be found in Runner.run
  • Ignore E203 flake8 warning

Testing

  • Remove subjective Service test test_serve_with_timeout
  • Use temporary file rather than temporary directory for tests where possible
  • Test Dataset.deserialise

Quality Checklist

  • New features are fully tested (No matter how much Coverage Karma you have)

Coverage Karma

  • If your PR decreases test coverage, do you feel you have built enough Coverage Karma* to justify it?

Release/0.1.12

26 Mar 21:02
70b4ec3
Compare
Choose a tag to compare

Contents

New Features

  • Add Google Cloud Run deployment for services

Breaking changes

  • Move most parameters from Runner.run to Runner.__init__ (this avoids the need for partial functions)
  • Split Service.answer into two methods
  • Return question UUID from Service.ask

Minor fixes and improvements

  • Use CRC32C hash function instead of Blake3 (due to extra requirements of Blake3 and the fact that Google Cloud uses CRC32C)
  • Use default Google credentials in Pub/Sub service if GCPPubSubBackend.credentials_environment_variable is None
  • Add representations to Topic and Subscription
  • Ensure all topic/subscription names start with their provided namespace (and ensure the namespace appears only once)
  • Give Services a random UUID as an ID if none is provided
  • Give GCPPubSubBackend a default value for the credentials environment variable
  • Ensure GCP Storage paths always have the correct path separator
  • Fix other Windows path issues
  • Remove unused copy_template function

Testing

  • Add automated testing for Windows and MacOS (in addition to Ubuntu)
  • Use tox for cross-platform testing
  • Use sys.executable instead of python in subprocess.Popen calls to ensure the virtual environment's python executable is used
  • Ensure test paths are agnostic of operating system

Quality Checklist

  • New features are fully tested (No matter how much Coverage Karma you have)

Coverage Karma

  • If your PR decreases test coverage, do you feel you have built enough Coverage Karma* to justify it?

Release/0.1.11

15 Mar 13:01
e4da82e
Compare
Choose a tag to compare

Contents

Minor fixes and improvements

  • Remove test bucket environment variable
  • Remove environment variable default argument from GoogleCloudStorageEmulator
  • Add installation, usage, and testing instructions to README

Testing

  • Test ability to start more than one Google Cloud Storage emulator at once

Release: 0.1.10

12 Mar 20:18
6b9e4b2
Compare
Choose a tag to compare

Contents

New Features

  • Move Google Cloud Storage emulator into octue package, making it importable

Minor fixes and improvements

  • Allow storage emulator to find and use a free port
  • Remove need for STORAGE_EMULATOR_HOST environment variable for tests
  • Avoid assuming custom metadata is set in storage client
  • Move unittest.TestResult method replacements into Google Cloud emulators module
  • Remove tox from CI tests, using just GitHub actions instead

Add Google Cloud Storage support; deprecate python < 3.8

10 Mar 16:19
aa00826
Compare
Choose a tag to compare

Contents

New Features

  • Add GoogleCloudStorageClient
  • Write manifest, its datasets, and its datafiles to cloud in Analysis.finalise (#96)
  • Closes #84 - add auto tag and release workflow
  • Allow Google Cloud storage blobs to be represented by Pathable
  • Add Datafile, Dataset, and Manifest to_cloud and from_cloud methods
  • Allow regular GCP files to be represented as Datafiles

Minor fixes and improvements

  • Add cloud storage emulator once for all tests
  • Add disk usage and file age utilities
  • Allow Datasets to have custom names
  • Add storage.path module akin to os.path but for Google Cloud Storage paths
  • Allow Hashables' hash values to be set
  • Pass GCP project and bucket names to tests from environment (#93)
  • Add ability to delete topic and subscription when a Service has finished serving
  • Facilitate graceful exit for serving Services on KeyboardInterrupt
  • Use latest versions of flake8, isort, and black in pre-commit and across all files (#87)
  • Fix CI test skipping flag
  • Fix documentation links (#92)

Breaking changes

  • Remove testing and explicit support for python3.6 and python3.7
  • Remove base_from from Pathable and replace with more transparent method
  • Rename Datafile.posix_timestamp to Datafile.timestamp and remove default value
  • Make Datafile.last_modified private
  • Rename persistence subpackage to storage

Testing

  • Test that children can question their own children as part of answering a question
  • Close #94 - delete topics and subscriptions at the end of each test
  • Remove timeouts from tests and replace with thread executor shutdown upon test pass, meaning that tests that connect to Google Pub/Sub won't fail because the connection is slower than expected

Child services, documentation, easier logging, and CI

01 Feb 15:06
7bc5f01
Compare
Choose a tag to compare

Contents

New Features

  • Enable use of child services - solving #46.
    • To solve #57 we need to be able to define and run local children as well as remote ones.
    • This means we must allow multiple services to run locally and independently...
    • Which probably means we can also solve octue/twined-server#2 at the same time
  • Enable Documentation Build and Serve, Update README #70
    • Ultimately we wish to unify documentation between twined and octue-sdk-python, but this is best done at the time of refactoring large chunks of octue-sdk-python into twined ( See #69 ) but at the moment we wish to just serve what we've got so we can at least link to it.
  • Add option to handle developer logs separately from Scientist logs (#78)
  • Allow skipping of CI tests if #skip_ci_tests is in the commit body - the use case is to reduce unnecessary computation when knowing the tests will fail for a commit but still wanting to commit.

Minor fixes and improvements

  • Implement a proper issue template, either derived from .github repo or applied directly (c.f. octue/twined#60 )
  • Close #32 - stop CLI tests leaving output files in working area.