Releases: octue/octue-sdk-python
Releases · octue/octue-sdk-python
Release/0.1.17
Contents
New Features
- Allow
Datafile
to be used as a context manager for changes to local datafiles - Allow
Datafile.from_cloud
to be used as a context manager for changes to cloud datafiles - Allow
Datafile
to remember where in the cloud it came from - Add the following methods to
Datafile
:get_cloud_metadata
update_cloud_metadata
clear_from_file_cache
_get_cloud_location
_store_cloud_location
_check_for_attribute_conflict
- Avoid re-uploading
Datafile
file or metadata if they haven't changed - Raise error if implicit cloud location is missing from
Datafile
- Add
GoogleCloudStorageClient.update_metadata
method - Allow option to not update cloud metadata in
Datafile
cloud methods - Allow tags to contain capitals and forward slashes (but not start or end in a forward slash)
- Allow
datetime
and posix timestamps forDatafile.timestamp
- Add
Datafile.posix_timestamp
property
Breaking changes
- Close #148: remove
hash_value
fromDatafile
GCS metadata - When hashing
Datafile
s, only hash represented file (i.e. stop hashing metadata) - When hashing
Dataset
s andManifest
s, only hash the files contained (i.e. stop hashing metadata) - Make hash of
Hashable
instance with_ATTRIBUTES_TO_HASH=None
the empty string hash value"AAAAAA=="
Minor improvements
- Simplify output of
GoogleCloudStorageClient.get_metadata
- Make
Hashable
instances re-calculate theirhash_value
every time unless animmutable_hash_value
is explicitly provided (e.g. for cloud datafiles where you don't have the file locally to hash) - Add private
Identifiable._set_id
method - Close #147: pull metadata gathering for
Datafile
into method - Get
datetime
objects directly from GCS blob instead of parsing string serialisations - Add
time
utils module - Add hash preparation function to
Hashable
fordatetime
instances - Use the empty string hash value for
Datafile
if GCScrc32c
metadata isn't present - Stop serialising hash value of
Manifest
,Dataset
, andDatafile
Fixes
- Close #146: Stop serialising GCS metadata as JSON. This avoids strings in the metadata appearing in two sets of quotation marks on Google Cloud Storage. This is a breaking change for any files already persisted with JSON-encoded metadata.
- Remove ability to set custom hash value via
kwargs
when usingDatafile.from_cloud
Testing
- Factor out cloud datafile creation in datafile tests
Quality Checklist
- New features are fully tested (No matter how much Coverage Karma you have)
Release/0.1.16
Contents
Breaking changes
- Rename
Service.__init__
parameterid
toservice_id
to avoid built-in name clash - Move
deployment
package intocloud
package
Dependencies
- Use newest version of
twined
to support python>=3.6
Minor improvements
- Remove duplicate code and unnecessary comments from
Runner
- Raise error if
SERVICE_ID
envvar is missing from deployment environment - Disallow non-None empty values as
Service
IDs - Add base class for service backends; update docstrings
Fixes
- Use
OctueJSONEncoder
in JSON serialisation insideService.answer
to ensurenumpy
arrays are serialised
Testing
- Add tests for
Topic
andSubscription
- Add extra test for
Service
- Shorten runtime of
cli.start
test
Release/0.1.15
Contents
Fixes
- Add
from_string
option toSerialisable.deserialise
Testing
- Mock Google Pub/Sub
Service
,Topic
,Subscription
,Publisher
andSubscriber
in tests - Remove unneeded cleanup code from
Service
tests
Release/0.1.14
Contents
Breaking changes
- Remove
TagSet.__str__
Fixes
- Use TagSet to deserialise tags in
Datafile.from_cloud
- Add custom (de)serialise methods to
TagSet
- Return subtags of a
Tag
in order using aFilterList
- Remove separate dependencies copy/cache steps in Google Cloud Run Dockerfile so that it works for older versions of
docker
Minor improvements
- Remove absolute path from
Dataset
andManifest
serialisation - Add
Serialisable.deserialise
method - Add
filter
method toTagSet
to avoid e.g.taggable.tags.tags.filter
Operations
- Improve description of release workflow
Release/0.1.13
Contents
New features
- Support
setup.py
andrequirements-dev.txt
in Cloud Run Dockerfile - Retrieve credentials from Google Cloud Secret Manager and inject into environment in
Runner.run
- Add ability to retrieve and update cloud files via the
Datafile.download
orDatafile.open
methods - Allow cloud file attributes to be updated via
Datafile.to_cloud
method - Allow instantiation of
TagSet
s from JSON-encoded lists
Breaking changes
- Raise error if the datasets of the input manifest passed to
Service.ask
aren't all cloud-based
Fixes
- Fix
Dataset
construction from serialised form inManifest
- Fix
Datafile
construction from serialised form inDataset
- Fix
Datafile.deserialise
- Adjust usages of
tempfile.NamedTemporaryFile
to also work on Windows - Add timeout and retry to
Service.answer
- Add retry to
Service.wait_for_answer
- Add 60 second timeout for answering question in Cloud Run deployment
- Use correct environment variable for service ID in Cloud Run Dockerfile
- Set
_last_modified
,size_bytes
, and_hash_value
to null values if aDatafile
representing a cloud file is instantiated for a hypothetical cloud location (i.e. not synced to a cloud file at that point in time) - Allow
Dataset.get_file_sequence
use with no filter
Dependencies
- Use new
twined
version that supports validation ofcredentials
strand - Use newest version of
gcp-storage-emulator
Minor improvements
- Make
path
a positional argument ofDatafile
- Move
gunicorn
requirement intooctue
requirements - Raise warning instead of error if Google Cloud credentials environment variable is not found and return
None
as credentials - Move cloud code into new
cloud
subpackage - Raise
TimeoutError
inService.wait_for_answer
if no response is received by end of retries - Only look for
deployment_configuration.json
file in docker container/app
directory - Ensure
deployment_configuration.json
file is always loaded correctly in docker container - Pass credentials strand into
Runner
instance in Cloud Run deployment - Add
name
attribute toIdentifiable
mixin - Add Google Cloud metadata to
Datafile
serialisation - Add
deserialise
method toDatafile
- Add ability to add metadata to a
Datafile
instantiated from a regular cloud file - Use CRC32C hash value from Google Cloud when instantiating a
Datafile
from the cloud - Add ability to name
Datafile
s - Add ability to check whether a
Datafile
, allDatafile
s in aDataset
, or allDataset
s in aManifest
are located in Google Cloud - Use
Datafile.deserialise
when instantiating aDataset
from a dictionary - Add representation to
GCPPubSubBackend
- Load credentials strand JSON in
Runner
initialisation - Add location searched to message of error raised when
app
module can't be found inRunner.run
- Ignore
E203
flake8 warning
Testing
- Remove subjective
Service
testtest_serve_with_timeout
- Use temporary file rather than temporary directory for tests where possible
- Test
Dataset.deserialise
Quality Checklist
- New features are fully tested (No matter how much Coverage Karma you have)
Coverage Karma
- If your PR decreases test coverage, do you feel you have built enough
Coverage Karma
* to justify it?
Release/0.1.12
Contents
New Features
- Add Google Cloud Run deployment for services
Breaking changes
- Move most parameters from
Runner.run
toRunner.__init__
(this avoids the need for partial functions) - Split
Service.answer
into two methods - Return question UUID from
Service.ask
Minor fixes and improvements
- Use CRC32C hash function instead of Blake3 (due to extra requirements of Blake3 and the fact that Google Cloud uses CRC32C)
- Use default Google credentials in Pub/Sub service if
GCPPubSubBackend.credentials_environment_variable
isNone
- Add representations to
Topic
andSubscription
- Ensure all topic/subscription names start with their provided namespace (and ensure the namespace appears only once)
- Give
Service
s a random UUID as an ID if none is provided - Give
GCPPubSubBackend
a default value for the credentials environment variable - Ensure GCP Storage paths always have the correct path separator
- Fix other Windows path issues
- Remove unused
copy_template
function
Testing
- Add automated testing for Windows and MacOS (in addition to Ubuntu)
- Use
tox
for cross-platform testing - Use
sys.executable
instead ofpython
insubprocess.Popen
calls to ensure the virtual environment's python executable is used - Ensure test paths are agnostic of operating system
Quality Checklist
- New features are fully tested (No matter how much Coverage Karma you have)
Coverage Karma
- If your PR decreases test coverage, do you feel you have built enough
Coverage Karma
* to justify it?
Release/0.1.11
Contents
Minor fixes and improvements
- Remove test bucket environment variable
- Remove environment variable default argument from
GoogleCloudStorageEmulator
- Add installation, usage, and testing instructions to README
Testing
- Test ability to start more than one Google Cloud Storage emulator at once
Release: 0.1.10
Contents
New Features
- Move Google Cloud Storage emulator into octue package, making it importable
Minor fixes and improvements
- Allow storage emulator to find and use a free port
- Remove need for
STORAGE_EMULATOR_HOST
environment variable for tests - Avoid assuming custom metadata is set in storage client
- Move
unittest.TestResult
method replacements into Google Cloud emulators module - Remove
tox
from CI tests, using just GitHub actions instead
Add Google Cloud Storage support; deprecate python < 3.8
Contents
New Features
- Add
GoogleCloudStorageClient
- Write manifest, its datasets, and its datafiles to cloud in
Analysis.finalise
(#96) - Closes #84 - add auto tag and release workflow
- Allow Google Cloud storage blobs to be represented by
Pathable
- Add
Datafile
,Dataset
, andManifest
to_cloud
andfrom_cloud
methods - Allow regular GCP files to be represented as
Datafile
s
Minor fixes and improvements
- Add cloud storage emulator once for all tests
- Add disk usage and file age utilities
- Allow
Dataset
s to have custom names - Add
storage.path
module akin toos.path
but for Google Cloud Storage paths - Allow
Hashable
s' hash values to be set - Pass GCP project and bucket names to tests from environment (#93)
- Add ability to delete topic and subscription when a
Service
has finished serving - Facilitate graceful exit for serving
Service
s onKeyboardInterrupt
- Use latest versions of flake8, isort, and black in pre-commit and across all files (#87)
- Fix CI test skipping flag
- Fix documentation links (#92)
Breaking changes
- Remove testing and explicit support for
python3.6
andpython3.7
- Remove
base_from
from Pathable and replace with more transparent method - Rename
Datafile.posix_timestamp
toDatafile.timestamp
and remove default value - Make
Datafile.last_modified
private - Rename
persistence
subpackage tostorage
Testing
- Test that children can question their own children as part of answering a question
- Close #94 - delete topics and subscriptions at the end of each test
- Remove timeouts from tests and replace with thread executor shutdown upon test pass, meaning that tests that connect to Google Pub/Sub won't fail because the connection is slower than expected
Child services, documentation, easier logging, and CI
Contents
New Features
- Enable use of child services - solving #46.
- To solve #57 we need to be able to define and run local children as well as remote ones.
- This means we must allow multiple services to run locally and independently...
- Which probably means we can also solve octue/twined-server#2 at the same time
- Enable Documentation Build and Serve, Update README #70
- Ultimately we wish to unify documentation between twined and octue-sdk-python, but this is best done at the time of refactoring large chunks of octue-sdk-python into twined ( See #69 ) but at the moment we wish to just serve what we've got so we can at least link to it.
- Add option to handle developer logs separately from Scientist logs (#78)
- Allow skipping of CI tests if #skip_ci_tests is in the commit body - the use case is to reduce unnecessary computation when knowing the tests will fail for a commit but still wanting to commit.
Minor fixes and improvements
- Implement a proper issue template, either derived from
.github
repo or applied directly (c.f. octue/twined#60 ) - Close #32 - stop CLI tests leaving output files in working area.