-
-
Notifications
You must be signed in to change notification settings - Fork 115
Open
Labels
bugSomething isn't workingSomething isn't workingprovider/azure/vmCluster provider for Azure Virtual MachinesCluster provider for Azure Virtual Machines
Description
Discovered in #378, the Azure integration tests are failing:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of ThreadPoolExecutor-2_0 (140540411098880) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
File "/opt/conda/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/opt/conda/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/concurrent/futures/thread.py", line 78, in _worker
work_item = work_queue.get(block=True)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of IO loop (140540401657600) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
File "/opt/conda/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/opt/conda/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/distributed/utils.py", line 499, in run_loop
loop.start()
File "/opt/conda/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 199, in start
self.asyncio_loop.run_forever()
File "/opt/conda/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/opt/conda/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "/opt/conda/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/dask-cloudprovider/dask_cloudprovider/generic/vmcluster.py", line 339, in _start
await super()._start()
File "/opt/conda/lib/python3.8/site-packages/distributed/deploy/spec.py", line 309, in _start
self.scheduler = await self.scheduler
File "/opt/conda/lib/python3.8/site-packages/distributed/deploy/spec.py", line 64, in _
await self.start()
File "/dask-cloudprovider/dask_cloudprovider/generic/vmcluster.py", line 90, in start
await self.wait_for_scheduler()
File "/dask-cloudprovider/dask_cloudprovider/generic/vmcluster.py", line 50, in wait_for_scheduler
while not is_socket_open(ip, port):
File "/dask-cloudprovider/dask_cloudprovider/utils/socket.py", line 7, in is_socket_open
connection.connect((ip, int(port)))
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
FAILED
dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_create_rapids_cluster_sync ERROR
dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_render_cloud_init FAILED
dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_render_cloud_init ERROR
=================================================================================================================================== ERRORS ===================================================================================================================================
____________________________________________________________________________________________________________ ERROR at teardown of test_create_rapids_cluster_sync ____________________________________________________________________________________________________________
fixturedef = <FixtureDef argname='event_loop' scope='function' baseid=''>, request = <SubRequest 'event_loop' for <Function test_create_rapids_cluster_sync>>
@pytest.hookimpl(trylast=True)
def pytest_fixture_post_finalizer(fixturedef: FixtureDef, request: SubRequest) -> None:
"""Called after fixture teardown"""
if fixturedef.argname == "event_loop":
policy = asyncio.get_event_loop_policy()
try:
loop = policy.get_event_loop()
except RuntimeError:
loop = None
if loop is not None:
# Clean up existing loop to avoid ResourceWarnings
> loop.close()
opt/conda/lib/python3.8/site-packages/pytest_asyncio/plugin.py:364:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
opt/conda/lib/python3.8/asyncio/unix_events.py:58: in close
super().close()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <_UnixSelectorEventLoop running=True closed=False debug=False>
def close(self):
if self.is_running():
> raise RuntimeError("Cannot close a running event loop")
E RuntimeError: Cannot close a running event loop
opt/conda/lib/python3.8/asyncio/selector_events.py:89: RuntimeError
________________________________________________________________________________________________________________ ERROR at teardown of test_render_cloud_init _________________________________________________________________________________________________________________
fixturedef = <FixtureDef argname='event_loop' scope='function' baseid=''>, request = <SubRequest 'event_loop' for <Function test_render_cloud_init>>
@pytest.hookimpl(trylast=True)
def pytest_fixture_post_finalizer(fixturedef: FixtureDef, request: SubRequest) -> None:
"""Called after fixture teardown"""
if fixturedef.argname == "event_loop":
policy = asyncio.get_event_loop_policy()
try:
loop = policy.get_event_loop()
except RuntimeError:
loop = None
if loop is not None:
# Clean up existing loop to avoid ResourceWarnings
> loop.close()
opt/conda/lib/python3.8/site-packages/pytest_asyncio/plugin.py:364:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
opt/conda/lib/python3.8/asyncio/unix_events.py:58: in close
super().close()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <_UnixSelectorEventLoop running=True closed=False debug=False>
def close(self):
if self.is_running():
> raise RuntimeError("Cannot close a running event loop")
E RuntimeError: Cannot close a running event loop
opt/conda/lib/python3.8/asyncio/selector_events.py:89: RuntimeError
================================================================================================================================== FAILURES ==================================================================================================================================
______________________________________________________________________________________________________________________ test_create_rapids_cluster_sync _______________________________________________________________________________________________________________________
@pytest.mark.asyncio
@pytest.mark.timeout(1200)
@skip_without_credentials
@pytest.mark.external
async def test_create_rapids_cluster_sync():
> with AzureVMCluster(
vm_size="Standard_NC12s_v3",
docker_image="rapidsai/rapidsai:cuda11.0-runtime-ubuntu18.04-py3.8",
worker_class="dask_cuda.CUDAWorker",
worker_options={"rmm_pool_size": "15GB"},
) as cluster:
dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py:88:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dask-cloudprovider/dask_cloudprovider/azure/azurevm.py:570: in __init__
super().__init__(debug=debug, **kwargs)
dask-cloudprovider/dask_cloudprovider/generic/vmcluster.py:297: in __init__
super().__init__(**kwargs, security=self.security)
opt/conda/lib/python3.8/site-packages/distributed/deploy/spec.py:275: in __init__
self.sync(self._start)
opt/conda/lib/python3.8/site-packages/distributed/utils.py:338: in sync
return sync(
opt/conda/lib/python3.8/site-packages/distributed/utils.py:401: in sync
wait(10)
opt/conda/lib/python3.8/site-packages/distributed/utils.py:390: in wait
return e.wait(timeout)
opt/conda/lib/python3.8/threading.py:558: in wait
signaled = self._cond.wait(timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Condition(<unlocked _thread.lock object at 0x7fd20dd94420>, 0)>, timeout = 10
def wait(self, timeout=None):
"""Wait until notified or until a timeout occurs.
If the calling thread has not acquired the lock when this method is
called, a RuntimeError is raised.
This method releases the underlying lock, and then blocks until it is
awakened by a notify() or notify_all() call for the same condition
variable in another thread, or until the optional timeout occurs. Once
awakened or timed out, it re-acquires the lock and returns.
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in seconds
(or fractions thereof).
When the underlying lock is an RLock, it is not released using its
release() method, since this may not actually unlock the lock when it
was acquired multiple times recursively. Instead, an internal interface
of the RLock class is used, which really unlocks it even when it has
been recursively acquired several times. Another internal interface is
then used to restore the recursion level when the lock is reacquired.
"""
if not self._is_owned():
raise RuntimeError("cannot wait on un-acquired lock")
waiter = _allocate_lock()
waiter.acquire()
self._waiters.append(waiter)
saved_state = self._release_save()
gotit = False
try: # restore state no matter what (e.g., KeyboardInterrupt)
if timeout is None:
waiter.acquire()
gotit = True
else:
if timeout > 0:
> gotit = waiter.acquire(True, timeout)
E Failed: Timeout >1200.0s
opt/conda/lib/python3.8/threading.py:306: Failed
___________________________________________________________________________________________________________________________ test_render_cloud_init ___________________________________________________________________________________________________________________________
args = (), kwargs = {}, coro = <coroutine object test_render_cloud_init at 0x7fd20da17640>
@functools.wraps(func)
def inner(*args, **kwargs):
coro = func(*args, **kwargs)
if not inspect.isawaitable(coro):
pyfuncitem.warn(
pytest.PytestWarning(
f"The test {pyfuncitem} is marked with '@pytest.mark.asyncio' "
"but it is not an async function. "
"Please remove asyncio marker. "
"If the test is not marked explicitly, "
"check for global markers applied via 'pytestmark'."
)
)
return
> task = asyncio.ensure_future(coro, loop=_loop)
opt/conda/lib/python3.8/site-packages/pytest_asyncio/plugin.py:452:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
opt/conda/lib/python3.8/asyncio/tasks.py:672: in ensure_future
task = loop.create_task(coro_or_future)
opt/conda/lib/python3.8/asyncio/base_events.py:429: in create_task
self._check_closed()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <_UnixSelectorEventLoop running=False closed=True debug=False>
def _check_closed(self):
if self._closed:
> raise RuntimeError('Event loop is closed')
E RuntimeError: Event loop is closed
opt/conda/lib/python3.8/asyncio/base_events.py:508: RuntimeError
============================================================================================================================== warnings summary ==============================================================================================================================
dask_cloudprovider/azure/tests/test_azurevm.py::test_create_cluster
dask_cloudprovider/azure/tests/test_azurevm.py::test_create_cluster_sync
/opt/conda/lib/python3.8/contextlib.py:120: UserWarning: Creating your cluster is taking a surprisingly long time. This is likely due to pending resources. Hang tight!
next(self.gen)
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================================================================== short test summary info ===========================================================================================================================
FAILED dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_create_rapids_cluster_sync - Failed: Timeout >1200.0s
FAILED dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_render_cloud_init - RuntimeError: Event loop is closed
ERROR dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_create_rapids_cluster_sync - RuntimeError: Cannot close a running event loop
ERROR dask-cloudprovider/dask_cloudprovider/azure/tests/test_azurevm.py::test_render_cloud_init - RuntimeError: Cannot close a running event loop
======================================================================================================= 2 failed, 3 passed, 2 warnings, 2 errors in 2228.83s (0:37:08) =======================================================================================================
sys:1: RuntimeWarning: coroutine 'test_render_cloud_init' was never awaited
I'll look into this, once I can figure out how to enable all the logs in Azure.
jacobtomlinson
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingprovider/azure/vmCluster provider for Azure Virtual MachinesCluster provider for Azure Virtual Machines