Skip to content

method resolution surprise on EarthAccessFile #610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
itcarroll opened this issue Jun 25, 2024 · 7 comments · Fixed by #620 · May be fixed by #828
Open

method resolution surprise on EarthAccessFile #610

itcarroll opened this issue Jun 25, 2024 · 7 comments · Fixed by #620 · May be fixed by #828
Labels
type: bug Something isn't working

Comments

@itcarroll
Copy link
Collaborator

itcarroll commented Jun 25, 2024

There's something funny going on with the wrapping of fsspec.spec.AbstractBufferedFile, where the method resolution isn't coming out as expected.

The following gives me an AssertionError

import earthaccess

auth = earthaccess.login()
results = earthaccess.search_data(short_name="VIIRSJ1_L2_OC", count=1)
files = earthaccess.open(results)
with files[0] as g:
    assert g.read is getattr(g.f, "read")

I don't know about the internals, but getattr(g, "read") and g.__getattribute__("read") both follow the MRO to the read method of fsspec.spec.AbstractBufferedFile. We don't want that! We want the read method, given by g.__getattr__("read") as fsspec.implementations.http.HTTPFile.read as resulting from getattr(g.f) in the class definition.

class EarthAccessFile(fsspec.spec.AbstractBufferedFile):
def __init__(self, f: fsspec.AbstractFileSystem, granule: DataGranule) -> None:
self.f = f
self.granule = granule
def __getattr__(self, method: str) -> Any:
return getattr(self.f, method)

Is it possible that __getattr__ should be __getattribute__?

Sorry for the multiple edits; confused myself with two fs

@itcarroll itcarroll changed the title method resolution surprise on EarthAccessGranule method resolution surprise on EarthAccessFile Jun 25, 2024
@itcarroll
Copy link
Collaborator Author

No, don't use __getattribute__ b/c it leads to infinite recursion. New idea: this is not a class that should be subclassing at all.

@itcarroll
Copy link
Collaborator Author

When #620 removed the inherited fsspec.spec.AbstractBufferedFile class from earthaccess.EarthAccessFile, it broke the ability of xarray.backends.list_engines()["h5netcdf"].guess_can_open to recognize these pointers. It comes back to MRO, and the fact that EarthAccessFile is no longer an instance of io.IOBase.

@itcarroll itcarroll reopened this Sep 24, 2024
@itcarroll
Copy link
Collaborator Author

It would be very helpful if we had an integration test for the behavior that the EarthAccessFile class enables.

@itcarroll
Copy link
Collaborator Author

itcarroll commented Sep 25, 2024

A change to repair damage done in #620 is to reintroduce a base class on EarthAccessFile (and I think io.IOBase might work), but it doesn't fix this issue.

@itcarroll
Copy link
Collaborator Author

Merged #832 in order to make way for a release. Note that the test surfacing this bug has been marked as an expected fail.

@pytest.mark.xfail(
reason="This test reproduces a bug (#610) which has not yet been fixed."
)
def test_earthaccess_file_getattr():
fs = fsspec.filesystem("memory")
with fs.open("/foo", "wb") as f:
earthaccess_file = EarthAccessFile(f, granule="foo")
assert f.tell() == earthaccess_file.tell()
# cleanup
fs.store.clear()

nikki-t added a commit that referenced this issue Oct 1, 2024
@alexandervladsemenov
Copy link

I would like to clarify a couple of things.

First of all, comparing two bound methods should be done with "==", not with "is" to check for method equivalence:

class Dummy:
    def read(self): pass

d = Dummy()
print(d.read is d.read)  # → False
print(d.read == d.read)  # → True

Now running

auth = earthaccess.login()
results = earthaccess.search_data(short_name="VIIRSJ1_L2_OC", count=1)
files = earthaccess.open(results)
with files[0] as g:
    print(f"Type of g: {type(g)}")
    print(f"Has g.read? {hasattr(g, 'read')}")
    print(f"g.read: {g.read}")
    print(f"Has g.f? {hasattr(g, 'f')}")
    if hasattr(g, 'f'):
        print(f"Type of g.f: {type(g.f)}")
        print(f"Has g.f.read? {hasattr(g.f, 'read')}")
        if hasattr(g.f, 'read'):
            print(f"g.f.read: {getattr(g.f, 'read')}")
            print(f"g.read is g.f.read: {g.read == getattr(g.f, 'read')}")
    else:
        print("g.f does not exist")
    assert g.read == getattr(g.f, "read")

gives the following output:

QUEUEING TASKS | : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1534.13it/s]
PROCESSING TASKS | : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.14it/s]
COLLECTING RESULTS | : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7025.63it/s]
Type of g: <class 'earthaccess.store.EarthAccessFile'>
Has g.read? True
g.read: <bound method AbstractBufferedFile.read of <File-like object HTTPFileSystem, https://oceandata.sci.gsfc.nasa.gov/cmr/getfile/JPSS1_VIIRS.20171129T213001.L2.OC.nc>>
Has g.f? True
Type of g.f: <class 'fsspec.implementations.http.HTTPStreamFile'>
Has g.f.read? True
g.f.read: <bound method HTTPStreamFile._read of <File-like object HTTPFileSystem, https://oceandata.sci.gsfc.nasa.gov/cmr/getfile/JPSS1_VIIRS.20171129T213001.L2.OC.nc>>
g.read is g.f.read: False
Traceback (most recent call last):
File "/home/obpg/scripts/test_earthdata.py", line 26, in
assert g.read == getattr(g.f, "read")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Modifying the class as follows:

class EarthAccessFile(fsspec.spec.AbstractBufferedFile):
    """Handle for a file-like object pointing to an on-prem or Earthdata Cloud granule."""

    def __init__(
        self, f: fsspec.spec.AbstractBufferedFile, granule: DataGranule
    ) -> None:
        """EarthAccessFile connects an Earthdata search result with an open file-like object.

        No methods exist on the class, which passes all attribute and method calls
        directly to the file-like object given during initialization. An instance of
        this class can be treated like that file-like object itself.

        Parameters:
            f: a file-like object
            granule: a granule search result
        """
        import types
        import inspect
        self.f = f
        self.granule = granule
        # Automatically copy all methods from self.f
        for name, member in inspect.getmembers(f):
            if inspect.ismethod(member) or inspect.isfunction(member):
                setattr(self, name, types.MethodType(member.__func__, f))

    def __getattr__(self, method: str) -> Any:
        return getattr(self.f, method)

    def __reduce__(self) -> Any:
        return make_instance, (
            type(self.f),
            self.granule,
            earthaccess.__auth__,
            dumps(self.f),
        )

    def __repr__(self) -> str:
        return repr(self.f)

fixes the issue:

QUEUEING TASKS | : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1556.33it/s]
PROCESSING TASKS | : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.55it/s]
COLLECTING RESULTS | : 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5468.45it/s]
Type of g: <class 'earthaccess.store.EarthAccessFile'>
Has g.read? True
g.read: <bound method HTTPStreamFile._read of <File-like object HTTPFileSystem, https://oceandata.sci.gsfc.nasa.gov/cmr/getfile/JPSS1_VIIRS.20171129T213001.L2.OC.nc>>
Has g.f? True
Type of g.f: <class 'fsspec.implementations.http.HTTPStreamFile'>
Has g.f.read? True
g.f.read: <bound method HTTPStreamFile._read of <File-like object HTTPFileSystem, https://oceandata.sci.gsfc.nasa.gov/cmr/getfile/JPSS1_VIIRS.20171129T213001.L2.OC.nc>>
g.read is g.f.read: True

If we want to resolve the issue only for the read method, then

        # Automatically copy all methods from self.f
        for name, member in inspect.getmembers(f):
            if inspect.ismethod(member) or inspect.isfunction(member):
                setattr(self, name, types.MethodType(member.__func__, f))

should be replaced with

self.read = f.read

@alexandervladsemenov
Copy link

Important to note:
https://docs.python.org/3/reference/datamodel.html#object.__getattr__:

Called when the default attribute access fails with an AttributeError (either getattribute() raises an AttributeError because name is not an instance attribute or an attribute in the class tree for self; or get() of a name property raises AttributeError). This method should either return the (computed) attribute value or raise an AttributeError exception. The object class itself does not provide this method.

Note that if the attribute is found through the normal mechanism, getattr() is not called. (This is an intentional asymmetry between getattr() and setattr().) This is done both for efficiency reasons and because otherwise getattr() would have no way to access other attributes of the instance. Note that at least for instance variables, you can take total control by not inserting any values in the instance attribute dictionary (but instead inserting them in another object). See the getattribute() method below for a way to actually get total control over attribute access.

This means that :

    def __getattr__(self, method: str) -> Any:
        return getattr(self.f, method)

doesn't override existing methods like read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
Status: Done
2 participants