-
-
Notifications
You must be signed in to change notification settings - Fork 355
add a runtime type checker for metadata objects #3400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
this is pretty substantial so I would appreciate a lot of eyes @zarr-developers/python-core-devs if anyone has concerns about whether we should do any runtime type checking at all, maybe send those thoughts to the issue this PR closes I'm going to keep working on tests for the type checker, but so far it's working great. This PR does violating liskov for a few subclasses of our Similarly, there are lots of @TomAugspurger I think you in particular will appreciate some of the effects of this PR. Since we can annotate methods like That being said, I think the ArrayMetadata class will still need to do some internal consistency checks, like ensuring that the number of dimension names matches the length of |
…valuate_forward_ref
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3400 +/- ##
==========================================
- Coverage 94.70% 94.31% -0.39%
==========================================
Files 79 80 +1
Lines 9532 9795 +263
==========================================
+ Hits 9027 9238 +211
- Misses 505 557 +52
🚀 New features to boost your workflow:
|
This PR adds a runtime type checker specifically for checking JSON-like data against a type definition. It's currently a draft while I get the test suite happy and refine the API, but it's also ready for people to look at and try out. I'm pretty convinced of it's utility, but I also think we should have a good discussion about whether this feature is a good idea.
Demo
The basic API looks like this:
Some aspects might evolve while this is a draft, like the nature of the error messages.
Supported types
This is not a general-purpose type checker. It is targeted for the types relevant for Zarr metadata documents, and so It supports the following narrow set of types
cost
maintenance burden
The type checker itself is ~530 lines of commented code, broken up into functions which are mostly easy to understand. The typeddict part, and the logic for resolving generic types, is convoluted and potentially sensitive to changes in how python exposes type annotations at runtime. Many type annotation features have been designed for static type checkers and not use within a python program, so some of this is rather fiddly. But I don't think we are relying on any brittle or private APIs here.
performance
As currrently implemented, the type checker will report all detectable errors:
This is wasted compute when we don't care exactly how mismatched the data is, but it is a better user experience. We might need to tune this if performance becomes a problem, e.g. by introducing a
"fail_fast"
option that returns on the first error.benefit
We can instantly remove a lot of special-purpose functions. Most of the functions named
parse_*
(~30+ functions) and essentially all of the functions named*check_json*
(~30 functions) could be replaced or simplified with thecheck_type
function.We can also make our JSON loading routines type-safe:
mypy could not infer the type correctly, but basedpyright does:
While we could write a bespoke function that specifically checks all the possibilities for zarr v3 metadata. But then we would need to painfully modify that function by hand to support something like this:
alternatives
we could use an external JSON validation library / type checking like pydantic, attrs, msgspec, beartype, etc. But I would rather not add a dependency. With the approach in this PR, we keep control in-house, and because this PR just adds functions, it composes with the rest of our codebase at the moment. (FWIW right now this type checker doesn't do any parsing, it only validates. If you think we should parse instead of just validating, then IMO that's a job for our array metadata classes)
we could also do nothing, and continue writing JSON parsing code by hand. But I would rather not do that, because this invites bugs and makes it hard to keep up with sneaky spec changes. Specifically, I'm planning on writing a lot of new types to model the codecs defined in #3376, and I would rather just write the type and get the type checking (and type safety) for free.
closes #3285