A Map of All Our Failures: Why Reliable Systems Need More Deadline Misses
Authors:
Silviu S. Craciunas
Abstract:
"In real-time systems, we typically view a bug or deadline miss as a catastrophic failure. In this talk, I argue the opposite: a deadline miss is not a failure; it is a signal. During development and integration, we actually need more deadline misses, particularly as software complexity rises. Some architectural choices, especially regarding scheduling mechanisms and TSN shapers often mask errors and deadline misses, turning model inaccuracies into non-deterministic "ghosts" that are impossible to trace. We will explore how strict architectures, such as Time-Triggered designs, can transform these rare, mysterious anomalies into frequent "boring" failures, arguing that the best design strategy is to let your system fail loudly, frequently, and deterministically. By forcing hidden errors to manifest as chronic, reproducible deadline misses, we shift the focus from pure reliability to "debuggability", which is one important ingredient to building complex, safe systems."