Skip to content

Azure integration tests failing #379

@TomAugspurger

Description

@TomAugspurger

Discovered in #378, the Azure integration tests are failing:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of ThreadPoolExecutor-2_0 (140540411098880) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  File "/opt/conda/lib/python3.8/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.8/concurrent/futures/thread.py", line 78, in _worker
    work_item = work_queue.get(block=True)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of IO loop (140540401657600) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  File "/opt/conda/lib/python3.8/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.8/site-packages/distributed/utils.py", line 499, in run_loop
    loop.start()
  File "/opt/conda/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 199, in start
    self.asyncio_loop.run_forever()
  File "/opt/conda/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/dask-cloudprovider/dask_cloudprovider/generic/vmcluster.py", line 339, in _start
    await super()._start()
  File "/opt/conda/lib/python3.8/site-packages/distributed/deploy/spec.py", line 309, in _start
    self.scheduler = await self.scheduler
  File "/opt/conda/lib/python3.8/site-packages/distributed/deploy/spec.py", line 64, in _
    await self.start()
  File "/dask-cloudprovider/dask_cloudprovider/generic/vmcluster.py", line 90, in start
    await self.wait_for_scheduler()
  File "/dask-cloudprovider/dask_cloudprovider/generic/vmcluster.py", line 50, in wait_for_scheduler
    while not is_socket_open(ip, port):
  File "/dask-cloudprovider/dask_cloudprovider/utils/socket.py", line 7, in is_socket_open
    connection.connect((ip, int(port)))

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
FAILED
dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_create_rapids_cluster_sync ERROR
dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_render_cloud_init FAILED
dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_render_cloud_init ERROR

=================================================================================================================================== ERRORS ===================================================================================================================================
____________________________________________________________________________________________________________ ERROR at teardown of test_create_rapids_cluster_sync ____________________________________________________________________________________________________________

fixturedef = <FixtureDef argname='event_loop' scope='function' baseid=''>, request = <SubRequest 'event_loop' for <Function test_create_rapids_cluster_sync>>

    @pytest.hookimpl(trylast=True)
    def pytest_fixture_post_finalizer(fixturedef: FixtureDef, request: SubRequest) -> None:
        """Called after fixture teardown"""
        if fixturedef.argname == "event_loop":
            policy = asyncio.get_event_loop_policy()
            try:
                loop = policy.get_event_loop()
            except RuntimeError:
                loop = None
            if loop is not None:
                # Clean up existing loop to avoid ResourceWarnings
>               loop.close()

opt/conda/lib/python3.8/site-packages/pytest_asyncio/plugin.py:364:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
opt/conda/lib/python3.8/asyncio/unix_events.py:58: in close
    super().close()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_UnixSelectorEventLoop running=True closed=False debug=False>

    def close(self):
        if self.is_running():
>           raise RuntimeError("Cannot close a running event loop")
E           RuntimeError: Cannot close a running event loop

opt/conda/lib/python3.8/asyncio/selector_events.py:89: RuntimeError
________________________________________________________________________________________________________________ ERROR at teardown of test_render_cloud_init _________________________________________________________________________________________________________________

fixturedef = <FixtureDef argname='event_loop' scope='function' baseid=''>, request = <SubRequest 'event_loop' for <Function test_render_cloud_init>>

    @pytest.hookimpl(trylast=True)
    def pytest_fixture_post_finalizer(fixturedef: FixtureDef, request: SubRequest) -> None:
        """Called after fixture teardown"""
        if fixturedef.argname == "event_loop":
            policy = asyncio.get_event_loop_policy()
            try:
                loop = policy.get_event_loop()
            except RuntimeError:
                loop = None
            if loop is not None:
                # Clean up existing loop to avoid ResourceWarnings
>               loop.close()

opt/conda/lib/python3.8/site-packages/pytest_asyncio/plugin.py:364:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
opt/conda/lib/python3.8/asyncio/unix_events.py:58: in close
    super().close()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_UnixSelectorEventLoop running=True closed=False debug=False>

    def close(self):
        if self.is_running():
>           raise RuntimeError("Cannot close a running event loop")
E           RuntimeError: Cannot close a running event loop

opt/conda/lib/python3.8/asyncio/selector_events.py:89: RuntimeError
================================================================================================================================== FAILURES ==================================================================================================================================
______________________________________________________________________________________________________________________ test_create_rapids_cluster_sync _______________________________________________________________________________________________________________________

    @pytest.mark.asyncio
    @pytest.mark.timeout(1200)
    @skip_without_credentials
    @pytest.mark.external
    async def test_create_rapids_cluster_sync():

>       with AzureVMCluster(
            vm_size="Standard_NC12s_v3",
            docker_image="rapidsai/rapidsai:cuda11.0-runtime-ubuntu18.04-py3.8",
            worker_class="dask_cuda.CUDAWorker",
            worker_options={"rmm_pool_size": "15GB"},
        ) as cluster:

dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py:88:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dask-cloudprovider/dask_cloudprovider/azure/azurevm.py:570: in __init__
    super().__init__(debug=debug, **kwargs)
dask-cloudprovider/dask_cloudprovider/generic/vmcluster.py:297: in __init__
    super().__init__(**kwargs, security=self.security)
opt/conda/lib/python3.8/site-packages/distributed/deploy/spec.py:275: in __init__
    self.sync(self._start)
opt/conda/lib/python3.8/site-packages/distributed/utils.py:338: in sync
    return sync(
opt/conda/lib/python3.8/site-packages/distributed/utils.py:401: in sync
    wait(10)
opt/conda/lib/python3.8/site-packages/distributed/utils.py:390: in wait
    return e.wait(timeout)
opt/conda/lib/python3.8/threading.py:558: in wait
    signaled = self._cond.wait(timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Condition(<unlocked _thread.lock object at 0x7fd20dd94420>, 0)>, timeout = 10

    def wait(self, timeout=None):
        """Wait until notified or until a timeout occurs.

        If the calling thread has not acquired the lock when this method is
        called, a RuntimeError is raised.

        This method releases the underlying lock, and then blocks until it is
        awakened by a notify() or notify_all() call for the same condition
        variable in another thread, or until the optional timeout occurs. Once
        awakened or timed out, it re-acquires the lock and returns.

        When the timeout argument is present and not None, it should be a
        floating point number specifying a timeout for the operation in seconds
        (or fractions thereof).

        When the underlying lock is an RLock, it is not released using its
        release() method, since this may not actually unlock the lock when it
        was acquired multiple times recursively. Instead, an internal interface
        of the RLock class is used, which really unlocks it even when it has
        been recursively acquired several times. Another internal interface is
        then used to restore the recursion level when the lock is reacquired.

        """
        if not self._is_owned():
            raise RuntimeError("cannot wait on un-acquired lock")
        waiter = _allocate_lock()
        waiter.acquire()
        self._waiters.append(waiter)
        saved_state = self._release_save()
        gotit = False
        try:    # restore state no matter what (e.g., KeyboardInterrupt)
            if timeout is None:
                waiter.acquire()
                gotit = True
            else:
                if timeout > 0:
>                   gotit = waiter.acquire(True, timeout)
E                   Failed: Timeout >1200.0s

opt/conda/lib/python3.8/threading.py:306: Failed
___________________________________________________________________________________________________________________________ test_render_cloud_init ___________________________________________________________________________________________________________________________

args = (), kwargs = {}, coro = <coroutine object test_render_cloud_init at 0x7fd20da17640>

    @functools.wraps(func)
    def inner(*args, **kwargs):
        coro = func(*args, **kwargs)
        if not inspect.isawaitable(coro):
            pyfuncitem.warn(
                pytest.PytestWarning(
                    f"The test {pyfuncitem} is marked with '@pytest.mark.asyncio' "
                    "but it is not an async function. "
                    "Please remove asyncio marker. "
                    "If the test is not marked explicitly, "
                    "check for global markers applied via 'pytestmark'."
                )
            )
            return
>       task = asyncio.ensure_future(coro, loop=_loop)

opt/conda/lib/python3.8/site-packages/pytest_asyncio/plugin.py:452:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
opt/conda/lib/python3.8/asyncio/tasks.py:672: in ensure_future
    task = loop.create_task(coro_or_future)
opt/conda/lib/python3.8/asyncio/base_events.py:429: in create_task
    self._check_closed()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <_UnixSelectorEventLoop running=False closed=True debug=False>

    def _check_closed(self):
        if self._closed:
>           raise RuntimeError('Event loop is closed')
E           RuntimeError: Event loop is closed

opt/conda/lib/python3.8/asyncio/base_events.py:508: RuntimeError
============================================================================================================================== warnings summary ==============================================================================================================================
dask_cloudprovider/azure/tests/test_azurevm.py::test_create_cluster
dask_cloudprovider/azure/tests/test_azurevm.py::test_create_cluster_sync
  /opt/conda/lib/python3.8/contextlib.py:120: UserWarning: Creating your cluster is taking a surprisingly long time. This is likely due to pending resources. Hang tight!
    next(self.gen)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================================================================== short test summary info ===========================================================================================================================
FAILED dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_create_rapids_cluster_sync - Failed: Timeout >1200.0s
FAILED dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_render_cloud_init - RuntimeError: Event loop is closed
ERROR dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_create_rapids_cluster_sync - RuntimeError: Cannot close a running event loop
ERROR dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_render_cloud_init - RuntimeError: Cannot close a running event loop
======================================================================================================= 2 failed, 3 passed, 2 warnings, 2 errors in 2228.83s (0:37:08) =======================================================================================================
sys:1: RuntimeWarning: coroutine 'test_render_cloud_init' was never awaited

I'll look into this, once I can figure out how to enable all the logs in Azure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingprovider/azure/vmCluster provider for Azure Virtual Machines

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions