Releases: octue/octue-sdk-python
Use twined=^0.5.0
Contents (#454)
Dependencies
- Use
twined=^0.5.0
, which removes deprecated support for datasets provided as lists instead of as dictionaries
Remove deprecated code
Summary
Remove deprecated code that's built up over the past few months.
Contents (#450)
IMPORTANT: There are 5 breaking changes.
Refactoring
-
BREAKING CHANGE: Remove deprecated cloud bucket name and path parameters
Cloud paths must now be provided as
gs://bucket-name/path/within/bucket
instead of via thebucket_name
andpath_within_bucket
parameters -
BREAKING CHANGE: Remove deprecated provision of datasets as a list to
Manifest
Datasets must now be provided as a dictionary mapping their name to their path or
Dataset
instance. -
BREAKING CHANGE: Remove deprecation warning if using
store_datasets
parameterUpload datasets using
Dataset.to_cloud
after runningManifest.to_cloud
if you wish to upload the datasets within a manifest -
BREAKING CHANGE: Remove deprecated
update_cloud_metadata
parameterThis has been renamed to
update_metadata
-
BREAKING CHANGE: Remove
Dataset.from_cloud
andDataset.from_local_directory
Just use the
Dataset
constructor e.g.Dataset(path="gs://bucket-name/dataset")
or
Dataset(path="local/dataset")
instead
Allow Dataset.to_cloud to infer cloud location
Contents (#449)
Enhancements
- Allow
Dataset.to_cloud
to infer cloud location
Fixes
- Use cloud paths for relative paths when possible in
Dataset.to_cloud
Refactoring
- Factor out cloud properties and methods common to
Dataset
andDatafile
into newCloudPathable
mixin
Make Cloud Run services idempotent to questions
Summary
Stop Cloud Run services from running a given question more than once and rely on Pub/Sub retries for connection problems.
Contents (#446)
Enhancements
- Increase default delivery acknowledgement deadline to 120s
Fixes
- Acknowledge and drop questions redelivered to Cloud Run services based on their UUIDs
- Remove redundant question retries in
Service
and raise error instead if the delivery acknowledgement deadline is reached
Refactoring
- Factor out saving/updating of local metadata files
- Rename local metadata save function
- Rename
Service.send_exception_to_asker
toService.send_exception
- Use "answer" instead of "response" terminology in
Service
Unify cloud and local dataset instantiation
Summary
Remove the need for alternative constructors when instantiating datafiles from the cloud or from a local directory - the Dataset
constructor now handles this automatically.
Contents (#445)
Enhancements
- Unify local and cloud dataset instantiation via
Datafile.__init__
- Raise a deprecation warning if datasets are constructed via
Dataset.from_cloud
orDataset.from_local_directory
Unify metadata update methods
Summary
Add a single method for updating stored datafile and dataset metadata that deduces whether to update the local or cloud metadata.
Contents (#443)
Enhancements
- Add
Datafile.update_metadata
method - Add
Dataset.update_metadata
method
Refactoring
- Rename
update_cloud_metadata
parameter toupdate_metadata
inDatafile
instantiation and raise a deprecation warning if the old parameter name is used
Add ability to download datafiles from a signed URL without cloud permissions
Contents (#440)
Enhancements
- Add ability to download datafiles from a signed URL without cloud permissions
- Add
Datafile.generate_signed_url
method - Raise error if trying to modify a URL-based datafile
- Raise error if trying to generate a signed URL for a local datafile or dataset
Refactoring
- Use URI terminology in cloud storage path module
Testing
- Test that datafiles and datasets can be downloaded from URLs without cloud permissions
Validate output location in runner instead of twine
Summary
Contents (#439)
Enhancements
- Validate output locations given to
Runner
Dependencies
- Use
twined=^0.4.1
Allow dataset metadata to be updated
Summary
This release provides public methods and a context manager for updating datasets' metadata easily. It also standardises the internals of metadata getting, setting, and using across Datafile
and Dataset
.
Contents (#436)
New features
- Add context manager for updating dataset stored metadata
Enhancements
- Add new
Metadata
mixin toDatafile
,Dataset
, andManifest
- Allow kwargs to be provided to
Dataset.from_cloud
Fixes
- Stop creating local metadata file on instantiation of
Dataset
- Stop implicitly uploading metadata when calling
Dataset.from_cloud
- Add missing
name
property setter toDataset
- Use correct metadata path for signed URL datasets
Refactoring
- Factor out
metadata
method into newMetadata
mixin - Rename
Dataset._upload_cloud_metadata
toDataset.update_cloud_metadata
- Rename
Dataset._save_local_metadata
toDataset.update_local_metadata
Simplify datafile metadata resolution order
Summary
This release simplifies the Datafile
class internals and clarifies its metadata resolution order. Stored metadata will now be used in preference to instantiation metadata unless the hypothetical
parameter is True
, allowing the removal of some confusing internal logic from the class.
Contents (#433)
Enhancements
- If
hypothetical
is notTrue
when re-instantiating existing datafiles, always use their stored metadata (from the cloud object or local metadata file) - Store cloud metadata on
Datafile
instances without theoctue__
namespace prefix in its keys - Make
Datafile
metadata update methods public so they can be called easily by users - Make it optional whether to include the SDK version in the output of
Datafile.metadata
- Return
None
fromGoogleCloudStorageClient.get_metadata
if bucket not found instead of raising an error
Fixes
- Allow instantiation of a cloud datafile with a non-existent or inaccessible cloud path (defer raising errors until attempting to access it)
Refactoring
- Simplify
Datafile
internals by removing the "initialisation parameters" concept - Rename
Datafile._get_local_metadata
toDatafile._use_local_metadata
- Align
Datafile._use_cloud_metadata
andDatafile._use_local_metadata
methods - Factor out setting
Datafile
instance metadata from stored metadata into_set_metadata
method