-
Notifications
You must be signed in to change notification settings - Fork 7
Add FlatFSAdapter #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Checking the value of
Nor do it, it's just the way it is. 😉
Just a safety measure in case the format of the SHARDING file or other stuff where to every change significantly.
I implicitly answered this in my review already, be just write a new
Some ideas:
|
Regarding To solve that it's easiest to just stuff the entire T = TypeVar('T', bound="_Adapter")
class _Adapter(…):
…
@classmethod
async def create(cls: typing.Type[T], child_datastore: DS, *, default_sharding_fn: str = …, sharding_key: datastore.Key = None, prefix: str = None,**kwargs) -> T:
sharding_fn = await <the-sharding-file-stuff>
return cls(child_datastore, *, _sharding_fn=sharding_fn, **kwargs) BTW: I don't think |
Thanks for sending me down that rabbit hole 😅 For the Additionally, in regards to the |
No, turns out at least the FileSystemDatastore needs one as well as it currently calls
In the |
Made the changes suggested. Thanks for all the help thus far! I can add the default async |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found some more issues with regards to exception handling and typing unfortunately. Also there don't seem to be any tests yet. Your on the right track however! 😉
Exception | ||
Empty shard identifier file. | ||
Exception | ||
Prefix was not present in sharding file | ||
Exception | ||
Sharding file was in incorrect format | ||
Exception | ||
Expect `v1` format | ||
Exception | ||
`key_length` shouold be an integer | ||
Exception | ||
`function_name` was not `prefix`, `suffix`, or `next-to-last` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing raise Exception("<some text>")
is an anti-pattern in Python (unlike JavaScript where the exception type is mostly useless anyways). Please add an exception hierarchy like the following for this:
class InitializationError(RuntimeError):
"""Raised when there was an error with opening this datastore adapter."""
datastore: T
def __init__(self, datastore: T, *args):
self.datastore = datastore
super().__init__(*args)
# Use for other errors
class ShardingStateError(InitializationError):
"""Raised when the existing sharding entry could not be read."""
key: datastore.Key
def __init__(self, datastore: T, key: datastore.Key, *args):
self.key = key
super().__init__(datastore, *args)
# Use for: Empty shard identifier file.
class ShardingStateEmptyError(ShardingStateError):
"""Raised when the existing sharding entry is empty."""
# Use for: Expect `v1` format & `function_name` was not `prefix`, `suffix`, or `next-to-last`
class ShardingStateUnsupportedError(ShardingStateError):
"""Raised when the existing sharding entry uses an unsupported version number or sharding function."""
(The comment lines are just for your orientation.) Add this to the file top, add the relevant entries to __all__
and update all raise statements to raise <ErrorType>(self, sharding_key, "<some text>")
. This way consumers of this API can more easily distinguish between the kinds of errors raised (using try: …; except <ErrorType> as exc: …
). Also some extra error data is included to help callers understand in which context the error occurred.
sharding_func = next( # type: ignore[assignment] # noqa: F821 | ||
iter(await child_datastore.get_all(sharding_key)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it will return a binary string / bytes object when self
is BinaryAdapter
, but you need a Unicode string so you'll need to call decode
somewhere. I propose the following to support for object and binary datastores:
if isinstance(self, datastore.abc.BinaryAdapter):
# Decode using latin-1 to get a 1:1 binary to Unicode mapping that cannot fail
sharding_func = (await child_datastore.get_all(sharding_key)).decode("latin-1")
else:
# Expect result from object datastore be a Unicode string
sharding_func = typing.cast(str, await child_datastore.get_all(sharding_key))
assert isinstance(sharding_func, str)
if not sharding_func: | ||
raise Exception("Empty shard identifier file.") | ||
|
||
sharding_func = sharding_func.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Place this line above the above exception as bool(" ") is True
(not False
)
if not sharding_fn: | ||
raise Exception(f"Prefix ({prefix}) was not present in {sharding_func}") | ||
|
||
parts: list = sharding_fn.split("/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the type tying.List[str]
here to prevent type erasure.
funcs: dict = { | ||
"prefix": cls._prefix, # type: ignore[attr-defined] # noqa: F821 | ||
"suffix": cls._suffix, # type: ignore[attr-defined] # noqa: F821 | ||
"next-to-last": cls._next_to_last # type: ignore[attr-defined] # noqa: F821 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a class variable and change the type to typing.Dict[str, typing.Callable[[datastore.Key, int], datastore.Key]]
for a proper typing signature.
|
||
try: | ||
func = funcs[function_name] | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like above, except Exception:
is an antipattern in Python unless you really mean “catch all non-critical errors”. In this case you only want to catch KeyError
(I presume), so write that instead.
Additionally, if you write something like except …: raise <ErrorType>(…)
in Python, the interpreter will assume that your code crashed while trying to handle the previous exception and will show this in the backtrace. If you are instead raising an exception because you caught another write except … as exc: raise <ErrorType>(…) from exc
instead so that the interpret can show proper explanations of what happened in stack traces.
|
||
prefix: str = "/repo/flatfs/shard/" | ||
_default_sharding_key: datastore.Key = datastore.Key("SHARDING") | ||
_default_sharding_func: str = "v1/prefix/2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use "v1/next-to-last/2"
here as that is the default go-ipfs uses.
|
||
@classmethod | ||
async def _parse_sharding_function( | ||
cls: typing.Type[T], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use cls: typing.Type[T],
here, it will let you drop a bunch of mypy attr-defined comments in this method.
KEY_TRANSFORM_T = typing.Callable[[datastore.Key], datastore.Key] | ||
T = typing.TypeVar('T', bound="_Adapter") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bound here should be on "_FlatFSAdapter"
, not just "_Adapter"
. That may also fix many mypy complains, if I'm not mistaken…
@adamjsawicki: Are you still intending on working on this? |
Implements a FlatFSAdapter based on https://github.com/ipfs/go-ds-flatfs (@Kubuxu)
It is an implementation of a KeyTransform Adapter, as noted by @Alexander255 here.
A few notes:
SHARDING
file at all seems superfluous. It seems more in line with the other adapters, especiallyNestedPath
that theFlatFSAdapter
would take as argumentssharding_function_name: str
andkey_length: int
that could have defaults as opposed to doing any type of IO and making the user adhere to what seems to be an arbitrary spec.default_prefix
is what it isv1
A few questions:
SHARDING
file. If so, where should it be kept?SHARDING
?_parse_sharding_function
a helper function_get_sharding_function_string
that could be mocked. Thoughts?