Skip to content

Prevent stacking of power requests #1023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

daniel-zullo-frequenz
Copy link
Contributor

@daniel-zullo-frequenz daniel-zullo-frequenz commented Aug 5, 2024

The power distributing actor processes one power request at a time to prevent multiple requests for the same components from being sent to the microgrid API concurrently. Previously, this could lead to the request channel receiver becoming full if the power request frequency was higher than the processing time. Even worse, the requests could be processed late, causing unexpected behavior for applications setting power requests. Moreover, the actor was blocking power requests with different sets of components from being processed if there was any existing request.

This patch ensures that the actor processes one request at a time for different sets of components and keeps track of the latest pending request if there is an existing request with the same set of components being processed. The pending request will be overwritten by the latest received request with the same set of components, and the actor will process it once the request with the same components is done processing.

@github-actions github-actions bot added part:tests Affects the unit, integration and performance (benchmarks) tests part:actor Affects an actor ot the actors utilities (decorator, etc.) labels Aug 5, 2024
@daniel-zullo-frequenz daniel-zullo-frequenz force-pushed the fix/power-distributing-actor branch from e1cdce4 to 85d0409 Compare August 5, 2024 20:16
@github-actions github-actions bot added the part:docs Affects the documentation label Aug 5, 2024
@daniel-zullo-frequenz
Copy link
Contributor Author

@shsms or @llucax would you mind to have an initial look at this draft?

Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn ´t check the docs updates yet, focusing on the code first.


def cleanup(task: asyncio.Task[None]) -> None:
task_name = task.get_name()
assert task_name in power_tasks, "Task not found in power_tasks"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably log an error here and return, I guess bringing the whole thing down doesn't make sense only because we can't clean something up. Or maybe this is just ignored, because is called via add_done_callback() 🤔

In any case logging an error I think is the right approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mainly using this to find programming errors, the logic is wrong if this ever happens. Anyway we can also log it as an error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree the logic is wrong, but it doesn't seem like a very bad thing if it happens for some reason, it only means someone, somehow, removed the task, or it wasn't added in the first place, but we can keep working with what we have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I previously saw your point and I updated it. The assertion was mainly useful for me to code and debug the patch-fix

self._component_manager.distribute_power(request),
name=task_name,
)
power_tasks[task_name].add_done_callback(cleanup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to actually be a perfect use case for a class I'm planning to add to core soon ™️ as part of the revamp of background service (I'm splitting out the task handling code to have a TaskGroup that is more useful for us for cases like this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 148 to 154
if task_name in power_tasks:
result = Error(
request=request,
msg="A request for the same components is being processed. Skipped.",
)
await self._result_sender.send(result)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to cancel the ongoing request and do the new one instead? Maybe not because we can end up with no request being finished if we have a burst of requests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the latest one needs to be executed, but instead of cancelling the old one, I think we should keep the latest request and send it as soon as the running one is complete. That is, discard only the older requests and execute the latest.

Copy link
Contributor Author

@daniel-zullo-frequenz daniel-zullo-frequenz Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not because we can end up with no request being finished if we have a burst of requests?

Indeed, my thought here was to avoid entering a loop of cancelling the in-progress request when a new request arrives. This is a bit tricky because we don't really know the state of the current processing request so I decided to ignore/skip the new request. This is the point we all need to agree on. @shsms any thoughts on this?

Wouldn't it be better to cancel the ongoing request and do the new one instead?

According to the outdated documentation, the actor was doing this. And the current state is the actor is just queuing up requests (without this patch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shsms I haven't seen your comment when I replied to this thread

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, I agree with the reasoning above on why cancelling doesn't make sense. We shouldn't cancel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to keep track of the latest pending power request

Comment on lines 149 to 153
result = Error(
request=request,
msg="A request for the same components is being processed. Skipped.",
)
await self._result_sender.send(result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't call this an Error. I guess we need a new result type. Maybe Discarded.

But I would rather not send a message at all, and keep the latest request and discard just the older ones, and execute the latest request as soon as the running one is complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I would rather not send a message at all, and keep the latest request and discard just the older ones, and execute the latest request as soon as the running one is complete.

Interesting. So you'd like to keep the current in-progress request (without cancelling it) and queue only the latest request that couldn't have been processed yet because there is a similar one in progress. It makes sense to me @llucax what do you think about this idea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not longer sending the Error, instead it is keeping track of the latest received power request

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of finishing the current request and keep only the latest one to execute next is great!

About not sending responses, I don't remember how responses work, but shouldn't you send a response for every request? If not then great, if yes I also agree Discarded is better than Error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shsms your input is needed here?

My understanding is it's only needed to send a response if the request is processed (either succeeded, partial failed, error, etc). The current patch works as a request replacement if the previous request was queued up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Ever since the power manager was introduced, there has not been a one-to-one mapping between user proposals and response from the power distributor. So just a log message should be enough saying new request received before previous one could start executing, so it is discarded. And I think we have something of this form.

Comment on lines 148 to 154
if task_name in power_tasks:
result = Error(
request=request,
msg="A request for the same components is being processed. Skipped.",
)
await self._result_sender.send(result)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the latest one needs to be executed, but instead of cancelling the old one, I think we should keep the latest request and send it as soon as the running one is complete. That is, discard only the older requests and execute the latest.

@daniel-zullo-frequenz daniel-zullo-frequenz force-pushed the fix/power-distributing-actor branch 2 times, most recently from a24e784 to 657bb46 Compare August 7, 2024 08:55
@daniel-zullo-frequenz
Copy link
Contributor Author

@llucax @shsms I haven't finished updated the documentation. Would you mind to have another look just focusing on the code?

@daniel-zullo-frequenz daniel-zullo-frequenz self-assigned this Aug 7, 2024
@daniel-zullo-frequenz daniel-zullo-frequenz added the priority:high Address this as soon as possible label Aug 7, 2024
@daniel-zullo-frequenz daniel-zullo-frequenz force-pushed the fix/power-distributing-actor branch from 77655e1 to 9d4ad72 Compare August 7, 2024 15:18
@daniel-zullo-frequenz daniel-zullo-frequenz marked this pull request as ready for review August 7, 2024 15:18
@daniel-zullo-frequenz daniel-zullo-frequenz requested a review from a team as a code owner August 7, 2024 15:18
@daniel-zullo-frequenz daniel-zullo-frequenz requested review from Marenz and removed request for a team August 7, 2024 15:18
@daniel-zullo-frequenz
Copy link
Contributor Author

Updated documentation

@daniel-zullo-frequenz daniel-zullo-frequenz force-pushed the fix/power-distributing-actor branch from 9d4ad72 to 0a731af Compare August 7, 2024 15:31
Comment on lines 149 to 153
result = Error(
request=request,
msg="A request for the same components is being processed. Skipped.",
)
await self._result_sender.send(result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of finishing the current request and keep only the latest one to execute next is great!

About not sending responses, I don't remember how responses work, but shouldn't you send a response for every request? If not then great, if yes I also agree Discarded is better than Error.

@llucax
Copy link
Contributor

llucax commented Aug 8, 2024

Just as a side note, I find the code a bit convoluted, but I think we can address that in the future.

@daniel-zullo-frequenz
Copy link
Contributor Author

Replaced the task name previously used as a key to identify processing and pending requests. The request ID, which is a frozenset of the component IDs, is now used instead. This should make the code flexible to address #1030

@daniel-zullo-frequenz
Copy link
Contributor Author

Just as a side note, I find the code a bit convoluted, but I think we can address that in the future.

Ok, let me know if you have some ideas to improve it and I can create an issue at least.
I failed my attempt to use the asyncio high level functionality given our use-case. I also tried to use the tasks set provided by the actor but unfortunately it wasn't suitable for our use-case.

@llucax
Copy link
Contributor

llucax commented Aug 9, 2024

Ok, let me know if you have some ideas to improve it and I can create an issue at least.

I actually tried to suggest something where we abstract the ongoing requests in an object that holds the task and the pending request (if any), but it was still complicated to implement like that because how we need to keep track of the tasks. I think the new PersistentTaskGroup in core can help a lot here, but it is not there yet :)

@daniel-zullo-frequenz
Copy link
Contributor Author

I actually tried to suggest something where we abstract the ongoing requests in an object that holds the task and the pending request (if any), but it was still complicated to implement like that because how we need to keep track of the tasks. I think the new PersistentTaskGroup in core can help a lot here, but it is not there yet :)

I had a quick look and make a comment there to check if we can make it 100% suitable for the use case in this PR. Please be aware that I probably misunderstood the PersistentTaskGroup capabilities.

In any case I can create an issue in the SDK to refactor the PowerDistributingActor using PersistentTaskGroup.

@llucax
Copy link
Contributor

llucax commented Aug 9, 2024

I actually stayed on this issue thinking about it again...

[.... one hour later due to some of the changes you did with the request ID :D ...]

It was a good exercise to see how to apply the new class. The result is pretty different to what I had in mind before, and it doesn't look particularly simple either in retrospect, but I think it is more efficient, because we only keep one task for the same req_id, as long as we have quests queued up, instead of starting and finishing a task for every pending request.

    async def _run(self) -> None:
        await self._component_manager.start()

        async with PersistentTaskGroup() as group:
            async for request in self._requests_receiver:
                self._handle_request(request, group)
                async for task in group.as_completed(timeout=0):
                    try:
                        task.result()
                    except Exception:  # pylint: disable=broad-except
                        _logger.exception("Failed power request: %s", request)

    async def _handle_request(self, request: ..., group: ...) -> None:
        req_id = self._get_request_id(request)

        if pending_request := self._pending_requests.get(req_id):
            _logger.debug(
                "Pending request: %s, overwritten with request: %s",
                pending_request,
                request,
            )
            return

        self._pending_requests[req_id] = request
        task = group.create_task(
            self._distribute_power(req_ir),
            name=f"{type(self).__name__}:{request}",
        )

    async def _distribute_power(self, req_id: ...) -> None:
        while request := self._pending_requests.get(req_id):
            self._component_manager.distribute_power(request)

At some point I want to add a way to get as_completed() as a Receiver, so we can put it in a select(), like:

        async with PersistentTaskGroup() as group:
            completed_tasks = CompletedTaskReceiver(group)  # This needs to be added to channels
            async for selected in select(self._requests_receiver, completed_tasks):
                if selected_from(self._requests_receiver, selected):
                    self._handle_request(selected.message, group)
                elif selected_from(completed_tasks, selected):
                    try:
                        task.result()
                    except Exception:  # pylint: disable=broad-except
                        _logger.exception("Failed power request: %s", request)

But for now we can live with using it with timeout=0 so it doesn't block, and we only ACK finished tasks after we handle a power request, which should be OK, we should receive requests often enough I guess.

Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Will leave the final approval to @shsms as he had some comments too.

@daniel-zullo-frequenz
Copy link
Contributor Author

@llucax the final result looks great! I'll create an issue referencing your draft to address it once PersistentTaskGroup is available

shsms
shsms previously approved these changes Aug 9, 2024
Copy link
Contributor

@shsms shsms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well.

@daniel-zullo-frequenz
Copy link
Contributor Author

Thanks, will rebase it onto latest v1.x.x since there are some conflicts with the release notes

daniel-zullo-frequenz and others added 4 commits August 9, 2024 12:06
The documentation was updated to reflect the current
state of the actor.

Signed-off-by: Daniel Zullo <[email protected]>
To add an entry to release-nots related to preventing stacking
of power requests to avoid delays in processing when the request
frequency exceeds the processing time.

Signed-off-by: Daniel Zullo <[email protected]>
The power distributing actor processes one power request at a time
to prevent multiple requests for the same components from being sent
to the microgrid API concurrently. Previously, this could lead to
the request channel receiver becoming full if the power request
frequency was higher than the processing time. Even worse, the
requests could be processed late, causing unexpected behavior
for applications setting power requests. Moreover, the actor was
blocking power requests with different sets of components from being
processed if there was any existing request.

This patch ensures that the actor processes one request at a time
for different sets of components and keeps track of the latest pending
request if there is an existing request with the same set of
components being processed. The pending request will be overwritten
by the latest received request with the same set of components,
and the actor will process it once the request with the same components
is done processing.

Signed-off-by: Daniel Zullo <[email protected]>
Changed log type for ignored requests to avoid spamming, as there are many reasons a request
might be ignored.

Co-authored-by: Leandro Lucarella <[email protected]>
Signed-off-by: daniel-zullo-frequenz <[email protected]>
@daniel-zullo-frequenz
Copy link
Contributor Author

Just for the record, created #1032 to improve the actor code

@daniel-zullo-frequenz daniel-zullo-frequenz added this pull request to the merge queue Aug 9, 2024
Merged via the queue into frequenz-floss:v1.x.x with commit e2a792c Aug 9, 2024
18 checks passed
@daniel-zullo-frequenz daniel-zullo-frequenz deleted the fix/power-distributing-actor branch August 9, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part:actor Affects an actor ot the actors utilities (decorator, etc.) part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests priority:high Address this as soon as possible
Projects
Development

Successfully merging this pull request may close these issues.

3 participants