Watchdog

Introduction

The application manager features a built-in watchdog mechanism that monitors the main thread's event loop, every Quick window's render thread and all clients of the application manager's Wayland compositor for unresponsive behavior. The event loop and render thread monitoring is implemented for both the System UI as well as for any QML application running in multi-process mode.

The watchdog is implemented as a separate thread that periodically (see checkInterval) checks the state of the monitored subsystems. If any of these fail to respond within a given time frame, the watchdog will first issue a warning (see warnTimeout) and eventually kill (see killTimeout) the affected thread or client. Please keep in mind, that due to the periodic nature of this check, the actual warning and killing timeout messages might be delayed by up to the checkInterval. Killing the affected thread directly (instead of just aborting the whole process) will cause the application manager's crash handler to print a backtrace for the stuck thread, which can be very useful to diagnose freezes.

Note: The watchdog is disabled by default. You need to enable it by setting at least one of the checkInterval configuration values in the main configuration file to a timeout that suits your specific device setup.

Systemd Support

Support for systemd's watchdog is built into the application manager as well: see {Installation}.

If enabled, the application manager will automatically detect at startup if it was launched by systemd and if the systemd unit file has the WatchdogSec option set. If this is the case, the application manager will periodically send the requested notifications to systemd from its watchdog thread.

Logging

The watchdog logs all its messages to the am.wd logging category. All logging is done from the separate watchdog thread and the main thread to minimize interference with the monitored threads or render loops.

The following logging levels are used:

Log LevelDescription
infoThe watchdog started (or stopped) watching an object (thread, window, Wayland client).
warningA warnTimeout has been exceeded.
criticalA killTimeout has been exceeded.

Performance Considerations

Nothing in life comes for free and the watchdog is no exception. While the overhead of the watchdog is generally very low, it does impact three areas:

  • For every frame rendered, the watchdog adds three invocations of a direct signal/slot connection: each call retrieves the current system time and stores it via an atomic fetch-and-store operation.
  • For every Qt event delivered in a watched thread, the watchdog adds two callbacks: each call checks the state via an atomic load, then retrieves the current system time, but only one stores it via an atomic fetch-and-store operation.
  • The separate watchdog thread runs a periodic check (see checkInterval). It retrieves the current system time and then collects time data via atomic load operations once for each of the watched objects.

Configuration

The watchdog is configured via the watchdog key in the main configuration file. Applications inherit these settings, but can also override any value by setting the corresponding key in their info.yaml manifest file.

The following interval and timeout values listed below let you specify the exact times with milli-seconds precision.

Setting any of the values to 0ms (or off) disables the respective functionality.

There's also the --disable-watchdog command line option that makes your life easier when debugging or testing in a production environment, as it completely disables all watchdog functionality in the System UI as well as in QML applications.

Config KeyTypeDescription
eventloop/checkIntervaldurationIf set to a positive time duration, the main event loop will be monitored by triggering a timer every checkInterval. (default: off)
eventloop/warnTimeoutdurationIn case the check timer is not firing within warnTimeout, the watchdog will print a warning. In addition another warning will be printed if the timer does eventually fire, stating the exact duration the event loop was blocked. (default: off)
eventloop/killTimeoutdurationIn case the check timer is not firing within killTimeout, the watchdog will print a critical warning and then abort the thread running the main event loop. (default: off)
quickwindow/checkIntervaldurationThe render thread monitor works a bit differently to the event loop and Wayland one: Instead of just a single "blocked" state, three different states are monitored:
  • Sync: The time it takes for the render thread to synchronize with the main thread.
  • Render: The time it takes for the render thread to actually render a frame.
  • Swap: The time the render thread spends in the graphics driver, swapping buffers.

As a render thread is not always actively rendering, the watchdog will only print a warning every checkInterval, if the thread is active and stuck in one of the aforementioned states. This periodic report also contains some statistics on how often the render thread got stuck in each state. (default: off)

quickwindow/warnTimeoutdurationThe watchdog will print a warning if a render thread is stuck in any of the syncing, rendering or swapping states for longer than warnTimeout. In addition another warning will be printed if the thread eventually leaves the state it was stuck in, stating the exact duration it was blocked. (default: off)
quickwindow/killTimeoutdurationIn case a render thread is stuck in any of the syncing, rendering or swapping states for longer than killTimeout, the watchdog will print a critical warning and then abort the thread. (default: off)
wayland/checkIntervaldurationIf set to a positive time duration, all currently active Wayland clients that use the XDG shell protocol will be pinged every checkInterval. (default: off)
wayland/warnTimeoutdurationIn case the pong reply from the Wayland client is not received within warnTimeout, the watchdog will print a warning. In addition another warning will be printed if the pong reply is eventually received, stating the exact duration the ping/pong round-trip took. (default: off)
wayland/killTimeoutdurationIn case the pong reply from the Wayland client is not received within killTimeout, the watchdog will print a critical warning and then kill the unresponsive Wayland client. For application manager apps, ApplicationObject::stop() with forceKill set to true will be invoked. Other apps will be killed by raising SIGKILL on the process id associated with the Wayland client. (default: off)

© 2024 The Qt Company Ltd. Documentation contributions included herein are the copyrights of their respective owners. The documentation provided herein is licensed under the terms of the GNU Free Documentation License version 1.3 as published by the Free Software Foundation. Qt and respective logos are trademarks of The Qt Company Ltd. in Finland and/or other countries worldwide. All other trademarks are property of their respective owners.