Zhiling Li

Posted 2025-04-01Updated 2026-02-27Research7 minutes read (About 982 words)

Adaptive Model Selection for Real-Time Heart Disease Detection

Overview

As a Research Assistant at North Carolina State University (under Prof. Zhishan Guo), I contributed to building and evaluating an Adaptive Model Selection (AMS) framework for real-time cardiovascular disease detection on wearable embedded hardware — targeting deployment on a Raspberry Pi 4.

The core problem: ECG inference latency is bounded by the patient’s instantaneous heart rate (higher HR = shorter beat deadline), but accuracy increases with a heavier model. A fixed-complexity model either misses deadlines at high heart rate or wastes capacity at low heart rate. Our AMS framework solves this by dynamically selecting from three model tiers at every beat window based on real-time HR.

The work was published as a research paper at an IEEE conference.

Publication: “Adaptive Model Selection for Real-Time Heart Disease Detection on Embedded Systems” (2nd author)
IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2025)

Task Definition

The system performs 5-class cardiac severity classification (severity levels 0–4) on single-lead ECG data in real-time. Rather than classifying individual disease types (the full dataset covers 72 disease categories), the system assigns a severity score to each heartbeat cycle — enabling timely alerts and continuous risk monitoring on a wearable without requiring a full diagnostic workup.

Dataset: PhysioNet 2021 Challenge — filtered to 22,359 single-label ECG recordings, split 64% training / 16% validation / 20% test.

Model Architecture

Each input segment contains β consecutive R–R cycles, each resampled to 256 samples. The model has two parallel branches fused via global attention:

ECG branch: A stem convolution → three Residual Blocks (channel widths 8β → 16β → 32β → 64β) each containing a Squeeze-and-Excitation (SE) unit for channel recalibration → adaptive average pooling to length α.

Period branch: The inter-beat period vector passes through a FC block (Linear → BatchNorm → ELU), mapping to the same 64β feature space.

Global Attention fusion: The flattened ECG features and period embedding are concatenated, fed to a two-layer attention module that produces a sigmoid mask modulating the ECG features — allowing the network to weight cycle regions by their rhythm context.

Output: The attended features pass through two FC layers to produce 5 logits (severity 0–4).

This architecture couples morphological feature extraction (ResBlocks + SE) with rhythm-aware re-weighting (period branch + global attention), kept compact for embedded deployment.

AMS Framework and Anytime CNN

Three model tiers share a common parameter-shared Anytime CNN backbone with early-exit heads:

High HR (≥ 90 bpm): Lightweight exit — fastest path, 0.57 ms, handles tight deadlines.
Moderate HR (70–90 bpm): Moderate exit — adds one ResBlock+SE, 1.79 ms.
Low HR (< 70 bpm): Advanced exit — full depth with global attention, 1.94 ms, highest accuracy.

At every shifted window, the AMS controller reads instantaneous HR and routes to the shallowest model that can meet the beat’s timing deadline. All three exits are jointly trained with deep supervision (equal-weight loss summing), so each head remains independently accurate while sharing the backbone weights — keeping the total checkpoint under 5 MB in the two-cycle configuration.

Results

Model	Cycles	Accuracy	F1	Inference (ms)	Deadline Misses
AMS + Anytime	2	91.5%	90.6%	1.33	0
Advanced (standalone)	2	92.6%	91.1%	1.94	431/1000
Moderate	2	87.8%	87.7%	1.79	259/1000
Lightweight	2	86.5%	86.6%	1.05	0
CNN-LSTM (baseline)	2	87.3%	87.6%	3.33	1000/1000

Key finding: Two cardiac cycles is the optimal input length — one extra beat provides enough temporal context to improve accuracy meaningfully, while three or four cycles push latency past the real-time budget. The AMS+Anytime configuration achieves the accuracy sweet spot (91.5%) with zero deadline misses across all heart-rate regimes.

Technical Details

Preprocessing:

R-peaks detected using Hamilton’s algorithm (BioSPPy library); heartbeat cycles extracted as R–R intervals and resampled to 256 points.
Labels assigned per-cycle based on the recording’s severity score; multi-label recordings excluded to eliminate annotation ambiguity.
Fixed preprocessing/label-alignment issues in the PhysioNet dataset that caused unstable cross-fold metrics.

Scheduling:

EDF (Earliest-Deadline-First) schedulability analysis verified the system can co-exist with other concurrent tasks (UI, Bluetooth, sensor fusion) on a uniprocessor without deadline violations.
A microsecond-resolution watchdog can pre-empt inference at a configurable fraction of the beat budget and fall back to a shallower exit if needed.

Training:

Adam optimizer, lr 0.001, batch size 128, early stopping on validation loss.
Multi-exit deep supervision: losses from all three exit heads summed with equal weights.
Evaluated on Raspberry Pi 4 (quad-core ARM Cortex-A72) as a proxy for commercial wearable SoCs.

Challenges

Latency–accuracy trade-off at the per-beat level: No single fixed model can meet deadlines at high HR while maximizing accuracy at low HR. The AMS+Anytime design resolves this by making depth selection a runtime policy rather than a design-time choice.
Label-alignment bugs in PhysioNet preprocessing: Early experiments showed high cross-fold metric variance. Root cause was windowing misalignment causing future-label leakage. Fixing alignment via Hamilton R-peak anchoring eliminated the variance.
Memory budget on embedded SoC: Three independent checkpoints would exceed wearable SRAM. Parameter sharing via early-exit architecture brings the two-cycle AMS model to under 5 MB — feasible for a smartwatch.

Reflection and Insights

The most important insight from this project: adaptive depth selection is not an optimization — it is a prerequisite for correctness in real-time embedded ML. A model that achieves 92.6% accuracy in batch evaluation but misses 431 out of 1000 deadlines on-device is not a working real-time system. Framing the problem through the lens of schedulability analysis (EDF, utilization bounds) made this explicit and led directly to the AMS design. The secondary insight is that multi-exit parameter sharing is the right architectural response to memory-constrained deployment: all complexity levels coexist in one checkpoint, switchable with zero weight reload overhead.

Team and Role

Research at NCSU under Prof. Zhishan Guo. My responsibilities: co-designing the CNN architecture (ResBlocks + SE + Global Attention), debugging the PhysioNet preprocessing pipeline, benchmarking model tiers on Raspberry Pi, contributing to AMS framework design, and co-authoring the RTCSA 2025 paper.

Posted 2024-07-01Updated 2026-02-27Research4 minutes read (About 671 words)

Cross-Modal LLM-Based Robotic Arm Interaction and Control System

Overview

As a Research Assistant at Southern University of Science and Technology (SUSTech), I led a research project integrating large language models (LLMs) with physical robotic arm control. The core idea was to allow a user to issue natural-language instructions to a robotic arm — and have those instructions reliably translated into executable, safe robot motion sequences — by designing an LLM agent with a structured tool/function-calling interface over a ROS action layer.

I led the proposal defense that secured research funding for the project, then drove the full system from design to physical hardware validation.

Results

Successfully demonstrated end-to-end natural-language-to-motion on a JAKA Zu5 robotic arm: spoken/typed instructions → structured action sequence → physical execution.
Validated the full stack in RViz/Gazebo simulation and on the physical JAKA Zu5 arm, confirming sim-to-real transfer with no manual re-tuning.
The LLM middleware correctly handled asynchronous inference latency: instructions were queued, execution proceeded without blocking, and feedback was synchronized on completion.

Technical Details

LLM Agent Design:

Designed an LLM agent with a tool/function-calling interface that maps natural-language commands to a predefined catalog of ROS action primitives (e.g., move-to-pose, grasp, release, home).
Implemented schema-based argument validation for each tool: the LLM must produce structured JSON arguments matching the action schema before any motion is dispatched, preventing malformed or unsafe commands from reaching the robot.

Sim-to-Real Stack (JAKA Zu5):

Refined the robot’s URDF: corrected link/joint frame origins and collision meshes to match the physical arm’s geometry.
Integrated the end-effector gripper into the URDF and MoveIt configuration.
Configured MoveIt motion planning with per-joint velocity and acceleration limits to ensure smooth, collision-free trajectories within hardware safety bounds.

ROS Action Middleware:

Built a custom ROS action middleware layer that bridges the asynchronous nature of LLM inference with real-time robot execution: actions are queued and dispatched with proper scheduling; the middleware provides feedback callbacks to the LLM agent so it can reason about the current execution state before issuing the next instruction.
Integrated vision-based pose estimation for object localization, feeding spatial information back into the LLM context to support pick-and-place style interactions.

Challenges

LLM latency vs. real-time execution: LLM inference is slow and non-deterministic in timing, but robot control requires timely, predictable action dispatch. Solved by decoupling inference and execution into separate threads with an action queue and explicit feedback synchronization rather than naive sequential calls.
Schema drift and hallucination: LLMs sometimes generate structurally invalid tool arguments. The schema-based validation layer acts as a strict interface contract — invalid arguments are rejected with an error message re-injected into the LLM context, prompting self-correction before execution.
URDF fidelity for sim-to-real transfer: Early Gazebo simulations showed trajectory drift on the physical arm due to inaccurate link inertia and joint offset values in the URDF. Systematically measuring and correcting these values eliminated the gap between simulated and physical behavior.

Reflection and Insights

This project made concrete the gap between LLM capability and deployment reliability: the model can understand intent fluently, but without an explicit typed interface contract and validation layer, the output is too unpredictable to safely actuate physical hardware. The schema/tool-calling design pattern — essentially treating the LLM as a high-level planner that calls typed APIs — was the key architectural insight that made the system robust. This pattern has become broadly influential in agentic LLM system design, and experiencing it in the context of a safety-critical physical system gave me a deep intuition for where it is and isn’t sufficient.

Team and Role

Research conducted at SUSTech under faculty supervision. My responsibilities included leading the proposal defense to secure funding, designing the LLM agent architecture and tool-calling interface, building the ROS action middleware, refining the JAKA Zu5 URDF and MoveIt configuration, integrating vision-based pose estimation, and coordinating system validation in simulation and on physical hardware.

Overview

Task Definition

Model Architecture

AMS Framework and Anytime CNN

Results

Technical Details

Challenges

Reflection and Insights

Team and Role

Overview

Results

Technical Details

Challenges

Reflection and Insights

Team and Role

Links

Categories

Tags