-
-
Notifications
You must be signed in to change notification settings - Fork 115
Open
Labels
provider/aws/ecsCluster provider for AWS ECSCluster provider for AWS ECS
Description
I use ECSCluster
heavily, but I find it has lots of small implementation issues that make it hard to reliably run large Dask clusters. Here's a laundry list of some of the problems I've run into.
- Task names are always derived from the cluster name. In a shared ECS cluster, task definitions overlap between multiple ECSCluster instances, making it impossible to tell what tasks belong to which Dask cluster.
- API rate limits are not handled properly. Combined with the log parsing for addresses (relates to Use ECS API to set Worker/Scheduler address instead of parsing logs #313, RuntimeWarning: get_log_events rate limit exceeded #121), large clusters are hard to reliably instantiate because the worker IP addresses can't be found.
- ECSCluster directly instantiates tasks without using a service. There's no good way to do placement strategies like binpack.
- Exited tasks are not handled and rescheduled. When workers run on spot instances, the Dask cluster can gradually lose workers as spot instances come and go.
- There's no way to configure capacity providers for tasks.
- There's no way to configure different subnets, environment variables, etc. for schedulers vs workers.
- There's no way to configure driver to something other than awslogs.
- Scaling a cluster while a previous scale is still in progress sometimes fails.
- Too many IAM permissions are required, even when using pre-existing ECS clusters and resources.
- Deprovisioning of tasks for both workers and schedulers is not clean (relates to ECSCluster does not de-provision tasks after failing to connect to scheduler #262).
- Closing the client and cluster objects results in dangling hooks.
Rather than trying to morph the existing ECSCluster
class, would this project be open to a completely new implementation (ECSCluster2
?). I anticipate API changes are required (i.e., the arguments to ECSCluster
). I'm willing to tackle this myself.
ddrinkadrorspei, mkarbo, rodrigoalmeida94 and nocnokneo
Metadata
Metadata
Assignees
Labels
provider/aws/ecsCluster provider for AWS ECSCluster provider for AWS ECS