Skip to content

Fix MQTT Resilience and Command Preservations #227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

danmrossi
Copy link

Hope i am doing this right. This should have in it the MQTT resilience fixes and command preservations

client.loop_start() spins up Paho’s I/O loop in a separate thread, honoring the reconnect_delay_set back-off policy.

The while not stop_event.is_set(): time.sleep(1) keeps the main thread alive, so the other threads (metrics gathering + update checker) can run and the on_message callbacks still fire.

When someone publishes “shutdown”/“restart” (or you hit Ctrl+C), you set stop_event, exit the loop, call client.loop_stop(), and then exit cleanly.

Because we’re no longer using loop_forever, there’s no more attempt to call .recv() on a missing socket, and the commands will continue to work after a broker restart.
Worker‐thread variables declared up front

Old: thread1 and thread2 only ever appeared inside __main__.

New: thread1 = None and thread2 = None are defined near the top, so both your connect handler and shutdown logic can safely refer to them.

on_message uses subprocess.run() instead of os.system()

All reboot/shutdown/display commands now call subprocess.run([...], check=True) for better error handling, instead of plain os.system("…").

Graceful “install” path tears down threads

The install branch’s update_and_exit() now sets stop_event, joins both threads if alive, and then exits, ensuring no stray background tasks.

gather_and_send_info() wrapped in a try/except

Your main metric‐collection loop is now protected so that any unexpected exception inside it will be caught and logged rather than killing the whole service.

Switched from loop_forever() to loop_start()/loop_stop()

Old: Paho’s client.loop_forever(retry_first_connection=True) ran in the main thread (and hid socket teardown bugs).

New: The network loop is kicked off with client.loop_start() once at startup, and explicitly stopped in the finally: block (or in your install/reboot paths) with client.loop_stop().

Main thread now just waits on stop_event

Instead of blocking inside Paho’s loop, there’s a simple while not stop_event.is_set(): time.sleep(1) in __main__. Ctrl+C (or an MQTT “install” message) sets stop_event and flows cleanly to shutdown.

Return codes and exit flags unified

You no longer mix exit_flag = True and sys.exit(0) in several places—everything uses stop_event (and thread joins) to coordinate a single, orderly teardown.

Why this fixes the NoneType.recv crash
Switching away from loop_forever() (which can tear down the socket under certain reconnect races) to a single, continuously running loop_start() thread—and only stopping it once when you really exit—prevents the Paho client from ever calling .recv() on a None socket.
Fix spelling of default
Fix spelling of version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant