Data-Flow Sensitive Fault Space Pruning for RISC-V

  • Typ der Arbeit: Bachelor-/Masterarbeit
  • Status der Arbeit: reserviert
  • Betreuer: Christian Dietrich

Testing fault tolerance mechanisms is commonly done by performing extensive fault-injection experiments on a system that try to mimic physical causes of radiation effects like soft errors/bit flips and then observing the system's behaviour. There are different strategies to perform such injections: In the most basic variant, one would inject every bit in every cycle, which spans a so-called fault space. However, this strategy is not an viable option for longer running programs that operate on large data sets.

Therefore, fault-space pruning tecniques where developed that subsume multiple faults of that basic injection model into an equivalence set, where an injection into any member of that set will yield the exact same result. Therefore, we only have to inject one fault of each set, the fault-injection pilot, and backproject the result onto all members of the set. One basic method to form such equivalence sets is the so-called Def/Use Pruning method, which subsumes possible faults of the same memory location between an read/write events. In a nutshell, as long as a faulty bit is stored passively in memory, it cannot have any influence on the system.

In previous work, we developed an extension to the classic def/use-pruning technique: data-flow sensitive fault pruning (DFPrune). This technique does not only use the read/write semantic of executed assembler instructions but also their semantic. For example, if you flip a bit in one argument of an XOR-instruction this is exactly equivalent to flip the same bit position in the output of the instruction. Other instruction-specific fault propagations that are easy to understand include MOV, AND, OR instructions.

Goal of this thesis is the refactoring of the existing code base of DFPrune and adapt it to also support the RISC-V archtecture. Currently, our working research C++ prototype is currently limited to the x86 architecture. The base for this thesis is the FAIL* open-source project.

Tasks

If you write this thesis as an Bachelor thesis, the RISC-V integration is only an extension goal of the thesis.

  • Refactor the existing DFPrune code base to better reflect the described method from our LCTES'21 paper.
  • Add support for the RISC-V architecture and manually implement some local fault-space equivalence rules for RISC-V
  • Quantify the run-time of the refactored DFPrune prototype and validate the pruning results for RISC-V

Further Reading

  • LCTES Conference A
    Data-Flow–Sensitive Fault-Space Pruning for the Injection of Transient Hardware Faults
    Oskar Pusz, Christian Dietrich, Daniel LohmannProceedings of the 2021 ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES '21)ACM Press2021.
    PDF Slides 10.1145/3461648.3463851 [BibTex]
  • EDCC Conference
    FAIL*: An Open and Versatile Fault-Injection Framework for the Assessment of Software-Implemented Hardware Fault Tolerance
    Horst Schirmeier, Martin Hoffmann, Christian Dietrich, Michael Lenz, Daniel Lohmann, Olaf SpinczykProceedings of the 11th European Dependable Computing Conference (EDCC '15)2015.
    PDF 10.1109/EDCC.2015.28 [BibTex]
  • FAIL*