Skip to content

Add FlatFSAdapter #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

adamjsawicki
Copy link

Implements a FlatFSAdapter based on https://github.com/ipfs/go-ds-flatfs (@Kubuxu)

It is an implementation of a KeyTransform Adapter, as noted by @Alexander255 here.

A few notes:

  • Unless I'm missing something about the implementation, involving a SHARDING file at all seems superfluous. It seems more in line with the other adapters, especially NestedPath that the FlatFSAdapter would take as arguments sharding_function_name: str and key_length: int that could have defaults as opposed to doing any type of IO and making the user adhere to what seems to be an arbitrary spec.
  • Not sure why the default_prefix is what it is
  • Not sure why we have versioning at all if we only accept v1

A few questions:

  • Should we ship with a SHARDING file. If so, where should it be kept?
  • What is the best way to test this? Is there a good way to mock what is in SHARDING?
    • On this note, I suppose we could pull out _parse_sharding_function a helper function _get_sharding_function_string that could be mocked. Thoughts?

@adamjsawicki adamjsawicki requested a review from ntninja February 2, 2020 21:06
@ntninja
Copy link
Collaborator

ntninja commented Feb 2, 2020

Unless I'm missing something about the implementation, involving a SHARDING file at all seems superfluous. It seems more in line with the other adapters, especially NestedPath that the FlatFSAdapter would take as arguments sharding_function_name: str and key_length: int that could have defaults as opposed to doing any type of IO and making the user adhere to what seems to be an arbitrary spec.

Checking the value of SHARDING is hugely important so that we don't corrupt existing datastores that don't use our preferred sharding function. Compatibility with the existing spec is an important goal here IMHO.

Not sure why the default_prefix is what it is

Nor do it, it's just the way it is. 😉

Not sure why we have versioning at all if we only accept v1

Just a safety measure in case the format of the SHARDING file or other stuff where to every change significantly.

Should we ship with a SHARDING file. If so, where should it be kept?

I implicitly answered this in my review already, be just write a new SHARDING key/file with our preferred sharding function if there is none already.

What is the best way to test this? Is there a good way to mock what is in SHARDING?

Some ideas:

  • Copy some (small) real-world repositories from go-ipfs to the test directory and query those.
  • Layer this adapter on top of DictDatastore (like in your doctest example) and set the value of the SHARDING key on that before initializing this adapter.

@ntninja
Copy link
Collaborator

ntninja commented Feb 3, 2020

Regarding await: You just stumbled on the joys of “two-colored“ functions there…

To solve that it's easiest to just stuff the entire await startup code into a new construction function and be done with it. Something like:

T = TypeVar('T', bound="_Adapter")
class _Adapter(…):
    …
    @classmethod
    async def create(cls: typing.Type[T], child_datastore: DS, *, default_sharding_fn: str = …, sharding_key: datastore.Key = None, prefix: str = None,**kwargs) -> T:
       sharding_fn = await <the-sharding-file-stuff>
       return cls(child_datastore, *, _sharding_fn=sharding_fn, **kwargs)

BTW: I don't think prefix should be configurable as a parameter here, better to make it a class level constant.

@adamjsawicki
Copy link
Author

adamjsawicki commented Feb 14, 2020

Thanks for sending me down that rabbit hole 😅

For the create function, do you see an issue with having one of the Adapters use a factory pattern while none of the others do? Or do you want all of the Adapters to conform to that pattern?

Additionally, in regards to the child_datastore, we still need access to it in the initialization of FlatFSAdapter, meaning that the top level object (i.e. one that inherits from both this Adapter and a Datastore) that would ultimately have access to a child_datastore does not exist yet, still barring our access.

@ntninja
Copy link
Collaborator

ntninja commented Feb 15, 2020

For the create function, do you see an issue with having one of the Adapters use a factory pattern while none of the others do? Or do you want all of the Adapters to conform to that pattern?

No, turns out at least the FileSystemDatastore needs one as well as it currently calls mkdir synchronously and I'd very surprised if some of the other backends (like leveldb) didn't need one at some point. Based on this, I'll actually add a default async .create() method to all datastores that just delegates to the synchronous __init__ soon and datastores that don't support synchronous initialization can then just reject that mode of creation.

Additionally, in regards to the child_datastore, we still need access to it in the initialization of FlatFSAdapter, meaning that the top level object (i.e. one that inherits from both this Adapter and a Datastore) that would ultimately have access to a child_datastore does not exist yet, still barring our access.

In the create example I posted it was already available as part of the somewhat opaque args parameter – I've slightly updated the example to make that clearer. Bottom line is: Since the value is forwarded as an argument to the the constructor of DatastoreAdapter, if had to be present, one way or another, at this level as well; it wasn't very obvious where it was.

@adamjsawicki
Copy link
Author

Made the changes suggested. Thanks for all the help thus far!

I can add the default async .create() method and some documentation around it in another PR/appended to this one if you'd like.

@adamjsawicki adamjsawicki requested a review from ntninja February 20, 2020 05:56
Copy link
Collaborator

@ntninja ntninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some more issues with regards to exception handling and typing unfortunately. Also there don't seem to be any tests yet. Your on the right track however! 😉

Comment on lines +443 to +454
Exception
Empty shard identifier file.
Exception
Prefix was not present in sharding file
Exception
Sharding file was in incorrect format
Exception
Expect `v1` format
Exception
`key_length` shouold be an integer
Exception
`function_name` was not `prefix`, `suffix`, or `next-to-last`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing raise Exception("<some text>") is an anti-pattern in Python (unlike JavaScript where the exception type is mostly useless anyways). Please add an exception hierarchy like the following for this:

class InitializationError(RuntimeError):
	"""Raised when there was an error with opening this datastore adapter."""
	datastore: T
	
	def __init__(self, datastore: T, *args):
		self.datastore = datastore
		super().__init__(*args)

# Use for other errors
class ShardingStateError(InitializationError):
	"""Raised when the existing sharding entry could not be read."""
	key: datastore.Key
	
	def __init__(self, datastore: T, key: datastore.Key, *args):
		self.key = key
		super().__init__(datastore, *args)

# Use for: Empty shard identifier file.
class ShardingStateEmptyError(ShardingStateError):
	"""Raised when the existing sharding entry is empty."""

# Use for: Expect `v1` format & `function_name` was not `prefix`, `suffix`, or `next-to-last`
class ShardingStateUnsupportedError(ShardingStateError):
	"""Raised when the existing sharding entry uses an unsupported version number or sharding function."""

(The comment lines are just for your orientation.) Add this to the file top, add the relevant entries to __all__ and update all raise statements to raise <ErrorType>(self, sharding_key, "<some text>"). This way consumers of this API can more easily distinguish between the kinds of errors raised (using try: …; except <ErrorType> as exc: …). Also some extra error data is included to help callers understand in which context the error occurred.

Comment on lines +465 to +467
sharding_func = next( # type: ignore[assignment] # noqa: F821
iter(await child_datastore.get_all(sharding_key))
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it will return a binary string / bytes object when self is BinaryAdapter, but you need a Unicode string so you'll need to call decode somewhere. I propose the following to support for object and binary datastores:

if isinstance(self, datastore.abc.BinaryAdapter):
	# Decode using latin-1 to get a 1:1 binary to Unicode mapping that cannot fail
	sharding_func = (await child_datastore.get_all(sharding_key)).decode("latin-1")
else:
	# Expect result from object datastore be a Unicode string
	sharding_func = typing.cast(str, await child_datastore.get_all(sharding_key))
	assert isinstance(sharding_func, str)

if not sharding_func:
raise Exception("Empty shard identifier file.")

sharding_func = sharding_func.strip()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Place this line above the above exception as bool(" ") is True (not False)

if not sharding_fn:
raise Exception(f"Prefix ({prefix}) was not present in {sharding_func}")

parts: list = sharding_fn.split("/")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the type tying.List[str] here to prevent type erasure.

Comment on lines +505 to +509
funcs: dict = {
"prefix": cls._prefix, # type: ignore[attr-defined] # noqa: F821
"suffix": cls._suffix, # type: ignore[attr-defined] # noqa: F821
"next-to-last": cls._next_to_last # type: ignore[attr-defined] # noqa: F821
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a class variable and change the type to typing.Dict[str, typing.Callable[[datastore.Key, int], datastore.Key]] for a proper typing signature.


try:
func = funcs[function_name]
except Exception:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like above, except Exception: is an antipattern in Python unless you really mean “catch all non-critical errors”. In this case you only want to catch KeyError (I presume), so write that instead.

Additionally, if you write something like except …: raise <ErrorType>(…) in Python, the interpreter will assume that your code crashed while trying to handle the previous exception and will show this in the backtrace. If you are instead raising an exception because you caught another write except … as exc: raise <ErrorType>(…) from exc instead so that the interpret can show proper explanations of what happened in stack traces.


prefix: str = "/repo/flatfs/shard/"
_default_sharding_key: datastore.Key = datastore.Key("SHARDING")
_default_sharding_func: str = "v1/prefix/2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "v1/next-to-last/2" here as that is the default go-ipfs uses.


@classmethod
async def _parse_sharding_function(
cls: typing.Type[T],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use cls: typing.Type[T], here, it will let you drop a bunch of mypy attr-defined comments in this method.

KEY_TRANSFORM_T = typing.Callable[[datastore.Key], datastore.Key]
T = typing.TypeVar('T', bound="_Adapter")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bound here should be on "_FlatFSAdapter", not just "_Adapter". That may also fix many mypy complains, if I'm not mistaken…

@ntninja
Copy link
Collaborator

ntninja commented Mar 28, 2020

@adamjsawicki: Are you still intending on working on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants