Efficient Cardiac Interval Estimation from Seismocardiography Using Wavelet-Based Neural Networks

Typ der Arbeit: Masterarbeit
Status der Arbeit: offen
Projekte: KORVEKSiS
Betreuer: Kazi Mohammad Abidur Rahman

Before continuing please visit: Master Thesis

TL;DR

Design, train, and embed a wavelet-front-end neural pipeline (DWT / WPT / scattering, with a CNN, LSTM, or TCN backbone) for cardiac-interval regression from SCG, and benchmark it against a raw-signal baseline on the nRF52840 to quantify the accuracy–energy trade-off under near-sensor constraints.

Motivation

Seismocardiography (SCG) measures heart-induced chest vibrations and enables estimation of key cardiac intervals such as the Pre-Ejection Period (PEP), Left-Ventricular Ejection Time (LVET), Isovolumic Contraction/Relaxation Times (ICT/IRT), and the Myocardial Performance Index (MPI). However, reliable extraction of these intervals is challenging due to noise, inter-subject variability, and subtle waveform features.Recent deep learning approaches operate directly on raw SCG signals. While effective, they require large models and datasets, making them inefficient for embedded systems such as wearable devices. SCG signals exhibit structured time–frequency patterns that can be captured using wavelet-based methods. A prior study from the Smart Sensors Group demonstrated that the Discrete Wavelet Transform (DWT) provides an efficient, zero-parameter representation on embedded hardware (nRF52840), but did not integrate this into a full learning pipeline.

This thesis addresses this gap by combining wavelet-based representations with neural networks and evaluating their efficiency under embedded constraints.

Candidate architectures (Arch A: DWT front-end, Arch B: WPT front-end, Arch C: scattering front-end) sketched in the precursor study

Figure: Candidate architectural directions sketched in the precursor study (Arch A — DWT front-end, Arch B — WPT front-end, Arch C — scattering front-end). For illustration only — the concrete decomposition depths, retained subbands / leaves, fusion strategies, and downstream backbone (CNN, LSTM, TCN, or hybrid) are open design choices to be explored within this thesis.

Research Question

To what extent does a fixed multi-resolution wavelet front-end (DWT, WPT, scattering) reduce the parameter, sample, and energy budgets of neural cardiac-interval regression from SCG, relative to raw-signal baselines, under embedded constraints?

Objectives

The thesis will deliver a reproducible, end-to-end study covering the following objectives. Concrete decomposition depths, retained subband subsets, and front-end hyperparameters are outcomes of the investigation, not preconditions: they are to be determined empirically through principled sensitivity studies, not fixed a priori.

The downstream neural regressor backbone is also a design degree of freedom. The student is free to choose between CNN, LSTM/GRU, Temporal Convolutional Network (TCN), or hybrid variants for each architecture, and is encouraged to motivate the choice in light of SCG's local morphology versus its longer-range temporal dependencies. The same backbone family must, however, be used consistently for fair comparison within a given experiment, and the choice must be reported alongside the matched-budget constraint.

O1 — Subband-decomposition pipeline (Arch A). Design and train a neural regressor whose input is a multi-resolution DWT decomposition of the SCG beat, with the decomposition depth and the subset of retained subbands chosen empirically based on physiological energy distribution and downstream interval-estimation utility.
O2 — Wavelet-packet pipeline (Arch B). Design and train a neural regressor over a wavelet-packet decomposition that adaptively refines selected regions of the time–frequency plane, with leaf retention governed by a relevance criterion (e.g. energy, mutual information with the regression target, or learned attention).
O3 — Scattering pipeline (Arch C). Design and train a neural regressor preceded by a fixed analytic scattering front-end, providing a parameter-free, analytically defined representation as the reference point for structured-input models.
O4 — Raw-signal baseline. Establish raw-input neural baselines (in the same backbone family used for Arch A–C) at matched parameter and FLOP budgets to isolate the contribution of the structured front-end from model capacity and from backbone choice.
O5 — Multi-criteria architectural comparison. Evaluate all pipelines under a subject-disjoint cross-validation protocol (e.g. Leave-One-Subject-Out, k-fold subject-grouped, leave-N-subjects-out, or any equivalent scheme that prevents subject leakage between training and evaluation) — the specific choice is left to the student and must be motivated and applied consistently across all architectures. Compare along all of the following axes — accuracy alone is insufficient evidence for or against a structured front-end:
- Predictive accuracy: MAE, MAPE, Bland–Altman limits of agreement, and Lin's CCC for PEP, LVET, ICT, IRT, MPI.
- Parameter and compute efficiency: trainable parameter count, MACs / FLOPs per beat, model size on disk.
- Sample efficiency: learning curves over fractional training-set sizes; how quickly each architecture reaches a target accuracy.
- Generalisation: inter-subject and inter-session variability; degradation under cohort or recording-condition shift.
- Robustness: tolerance to additive noise, motion artefacts, posture changes, baseline wander, and sensor-placement perturbations.
- Training stability: seed-to-seed variance, sensitivity to learning-rate / regularisation choices, convergence behaviour.
- Calibration and uncertainty: prediction-interval coverage, beat-level confidence versus realised error, identification of unreliable beats.
- Interpretability: which scales / subbands / time windows drive predictions (saliency, attribution, leave-one-out analysis); whether the learned feature emphasis aligns with known fiducial morphology.
- Quantisation sensitivity: accuracy retention under fixed-point inference (float vs. int8 / q15) — a prerequisite for the embedded port.
- Reproducibility: deterministic training pipelines, environment pinning, and complete release of seeds/configs/checkpoints.
O6 — Embedded realisation and on-silicon characterisation. Port the most favourable architecture to a Cortex-M-class target (nRF52840) and characterise it in terms of end-to-end latency, energy per inference (PPK2), RAM/flash footprint, real-time feasibility for streaming SCG, and the accuracy delta introduced by quantisation. The deliverable is an embedded accuracy–energy–footprint Pareto, not a single operating point.
O7 (exploratory) — Subband-driven distributed inference. Investigate minimal subband subsets that, when transmitted under a BLE-class uplink budget, preserve interval accuracy, enabling principled edge–cloud partitioning of the pipeline.

Work Plan and Timeline

The thesis is structured into six phases (M1–M6) over ~26 weeks, each ending in a concrete milestone.

Phase	Weeks	Scope	Key Activities	Milestone (finishing criteria)
M1 — Onboarding & reproduction	1 – 4	Alignment with the precursor study; project setup.	Repo / dataset / lab onboarding; reproduce the wavelet-family benchmark and embedded Pareto from the precursor study; literature review on wavelet–NN hybrids for biosignals; freeze the evaluation protocol (cross-validation scheme, metrics, seeds, splits, statistical-comparison procedure); choose and motivate the downstream backbone family (CNN / LSTM / GRU / TCN / hybrid).	Reproduced precursor figures + frozen protocol document committed to GitLab. (mandatory introductory presentation)
M2 — Arch A: DWT front-end + regressor	5 – 9	First structured-input pipeline; first head-to-head against the raw baseline.	Build the multi-branch / multi-channel regressor over the chosen DWT subbands; subband-selection and decomposition-depth sensitivity sweep; cross-validated training and hyperparameter search; train a matched-budget raw-signal baseline in the same backbone family; first comparison vs. baseline on PEP / LVET / ICT / IRT / MPI.	Trained Arch A model, comparison table vs. raw baseline, sensitivity-sweep results.
M3 — Arch B & Arch C: WPT and scattering front-ends	10 – 14	Wavelet-packet and analytic-scattering variants; cross-architecture comparison closes here.	Wavelet-packet decomposition with relevance-driven leaf pruning (energy / mutual information / learned attention) and fusion with coarser subbands; analytic scattering front-end (e.g. Kymatio) with depth and Q-factor sweep; consistent backbone across runs; cross-architecture sensitivity studies (filter family, decomposition depth, pruning policy); paired-bootstrap statistical comparison and CCC matrix across all four pipelines.	Complete architectural comparison report; mid-term presentation (optional).
M4 — Embedded realisation	15 – 19	Embedded port and on-silicon characterisation.	Quantisation (int8 / q15) of the favoured architecture; CMSIS-NN integration; RIOT-OS firmware integration with the existing q15 wavelet cascade; PPK2 energy capture; latency, RAM, and flash-footprint measurement; quantisation-induced accuracy delta.	On-device inference with measured accuracy–energy–footprint Pareto.
M5 — Exploratory / robustness study	20 – 22	One deeper investigation, chosen at end of M4.	Either subband-driven distributed inference: which subbands, transmitted at what rate, preserve interval accuracy under a BLE-class uplink budget — or robustness analysis: motion artefacts, posture changes, sensor placement, additive noise, and inter-subject / inter-session generalisation under cohort or recording-condition shift. The choice is motivated by progress and findings up to M4.	One additional analysis chapter.
M6 — Writing & defence	23 – 26	Thesis, defence, and reproducible release.	Thesis writing; defence preparation; tagged release of the reproducible code repository (training, evaluation, embedded firmware, measurement notebooks); optional draft of a short workshop / conference paper.	Submitted thesis, tagged GitLab repository, final defence.

Deliverables

Written thesis in LaTeX (Smart Sensor Group template).
Reproducible code repository (training, evaluation, embedded firmware) with tagged release.
Trained model checkpoints and evaluation scripts.
nRF52840 firmware image plus build/flash documentation.
PPK2 measurement logs and post-processing notebooks.
Final defence presentation.
(Optional) draft of a short conference / workshop paper.

Working Mode and Expectations

Independence. You are expected to work autonomously: plan your work, identify blockers early, conduct literature research, and drive experiments forward independently.
Supervision budget. Up to 10 meetings (1 hour each) over 6 months. You decide how to use them (e.g., discussing blockers, reviewing results, thesis feedback). Asynchronous communication (messages, GitLab issues, merge requests) is unlimited and encouraged.
Documentation and version control. All work (code, configs, logs, reports) must be maintained in GitLab with clear commit history, issues for open questions, and merge requests for review. Reproducibility is mandatory.
Innovation is welcomed. The defined objectives are a baseline. You are encouraged to propose and explore your own ideas (e.g., new models, optimisations, evaluation methods), as long as core deliverables remain on track.

Evaluation Criteria for the Thesis

The Work Packages and the Deliverables define the finishing criteria, i.e., what must be in place for the thesis to be considered complete. They do not define the grade. The final grade is determined holistically by the examiner across the full body of work, weighing in particular:

Scientific approach — quality of the research question framing, hypothesis design, experimental rigour, statistical soundness (e.g. correctness of the chosen cross-validation protocol and avoidance of subject leakage), and reproducibility.
Innovation and intellectual contribution — originality of ideas brought into the work beyond the prescribed scope; depth of insight into why a particular front-end helps or fails, not just which architecture wins.
Engineering and implementation quality — code clarity and structure, on-device numerical correctness (q15 vs. float reference), measurement integrity, embedded-system craftsmanship.
Written thesis — structure, clarity, figure quality, citation discipline, mathematical and notational precision.
Oral defence and presentation — ability to motivate the work, defend design decisions, and respond to technical questions.
Research conduct — independence, initiative, scientific honesty, and effective use of the limited supervision budget.

Required Skills

Required

Strong Python skills (NumPy, SciPy, PyTorch) and good software practices (Git, virtual environments, testing).
Solid background in digital signal processing (Fourier, wavelets, filter banks, multi-rate systems).
Knowledge of machine learning, especially sequence models (CNN, LSTM/GRU, TCN).
Basic experience with embedded C (Cortex-M, e.g., nRF52840) and low-level debugging.
Good English writing skills for technical documentation and reporting.

Desirable

Experience with wavelet/scattering tools (e.g., PyWavelets, Kymatio)
Familiarity with embedded ML frameworks (CMSIS-DSP, CMSIS-NN, TFLite Micro)
Experience with RTOS (RIOT-OS, Zephyr)
Understanding of model quantisation (int8, fixed-point)
Experience with power measurement tools (e.g., PPK2)
Knowledge of LaTeX (e.g., IEEE format)
Background in biomedical signal processing or wearable systems

Application Process

Curriculum vitae.
Current transcript of records.
Short motivation (≤ 1 page) explaining interest, relevant prior projects, and which of objectives you find most exciting.
Optional: link to a representative code repository or prior project report.

Contact: please send a single PDF to the superviosor mentioned above; reference "MA—DWT-NN-SCG" in the email subject line.