Fix MQTT Resilience and Command Preservations #227

danmrossi · 2025-06-11T07:58:41Z

Hope i am doing this right. This should have in it the MQTT resilience fixes and command preservations

client.loop_start() spins up Paho’s I/O loop in a separate thread, honoring the reconnect_delay_set back-off policy. The while not stop_event.is_set(): time.sleep(1) keeps the main thread alive, so the other threads (metrics gathering + update checker) can run and the on_message callbacks still fire. When someone publishes “shutdown”/“restart” (or you hit Ctrl+C), you set stop_event, exit the loop, call client.loop_stop(), and then exit cleanly. Because we’re no longer using loop_forever, there’s no more attempt to call .recv() on a missing socket, and the commands will continue to work after a broker restart.

Worker‐thread variables declared up front Old: thread1 and thread2 only ever appeared inside __main__. New: thread1 = None and thread2 = None are defined near the top, so both your connect handler and shutdown logic can safely refer to them. on_message uses subprocess.run() instead of os.system() All reboot/shutdown/display commands now call subprocess.run([...], check=True) for better error handling, instead of plain os.system("…"). Graceful “install” path tears down threads The install branch’s update_and_exit() now sets stop_event, joins both threads if alive, and then exits, ensuring no stray background tasks. gather_and_send_info() wrapped in a try/except Your main metric‐collection loop is now protected so that any unexpected exception inside it will be caught and logged rather than killing the whole service. Switched from loop_forever() to loop_start()/loop_stop() Old: Paho’s client.loop_forever(retry_first_connection=True) ran in the main thread (and hid socket teardown bugs). New: The network loop is kicked off with client.loop_start() once at startup, and explicitly stopped in the finally: block (or in your install/reboot paths) with client.loop_stop(). Main thread now just waits on stop_event Instead of blocking inside Paho’s loop, there’s a simple while not stop_event.is_set(): time.sleep(1) in __main__. Ctrl+C (or an MQTT “install” message) sets stop_event and flows cleanly to shutdown. Return codes and exit flags unified You no longer mix exit_flag = True and sys.exit(0) in several places—everything uses stop_event (and thread joins) to coordinate a single, orderly teardown. Why this fixes the NoneType.recv crash Switching away from loop_forever() (which can tear down the socket under certain reconnect races) to a single, continuously running loop_start() thread—and only stopping it once when you really exit—prevents the Paho client from ever calling .recv() on a None socket.

Fix spelling of default

Fix spelling of version

danmrossi added 6 commits June 11, 2025 14:51

Fix MQTT Error Handling

6d505c0

fix mqtt command subscription

a9a0ffd

Update install.sh

a983035

Fix spelling of default

Update rpi-cpu2mqtt.py

543e86b

Fix spelling of version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix MQTT Resilience and Command Preservations #227

Fix MQTT Resilience and Command Preservations #227

danmrossi commented Jun 11, 2025

Uh oh!

Uh oh!

Fix MQTT Resilience and Command Preservations #227

Are you sure you want to change the base?

Fix MQTT Resilience and Command Preservations #227

Conversation

danmrossi commented Jun 11, 2025

Uh oh!

Uh oh!