Engineering
My engineering work focuses on real-time computer vision systems, multimodal perception, and efficient deployment across NVIDIA GPU/CUDA, mobile, SoC, and MCU platforms.
Overview
- System types: C++ SDKs, real-time demos, embedded perception pipelines, and validation platforms
- Technical themes: multimodal perception, geometry-aware vision, on-device deployment, and inference acceleration
- Deployment targets: NVIDIA GPU/CUDA platforms, mobile, SoC, and MCU
Multimodal Pose and Hand Tracking Demo
Designed lightweight multimodal models for pose and hand tracking from grayscale and event-based input, then built real-time PC demos to showcase the system. Extended model outputs with motion cues and implemented an asynchronous pipeline to improve tracking behavior and throughput.
Focus
- Lightweight multimodal detection and tracking
- Real-time demo system implementation
- Tracking-oriented output design and asynchronous execution
Real-Time Gaze Tracking SDK
Built a real-time C++ SDK for face detection and gaze tracking with RGB and event-based input support. Integrated calibration, screen-space mapping, and runtime inference into a PC validation platform running at real-time speed.
Focus
- RGB and event-based multimodal input
- Real-time PC validation platform
- C++ SDK interface and runtime integration
Mobile Multi-Frame Denoising SDK
Implemented a C++ SDK for mobile-oriented multi-frame image denoising from an existing Python reference. Designed a reusable tile-based computation abstraction to support a stable alignment-and-merge pipeline and future SIMD optimization.
Focus
- Python-to-C++ translation for deployment
- Reusable tile-based computation abstraction
- Mobile-oriented SDK design
On-Device Speaker-Awareness Pipeline
Designed a deployable perception pipeline for a wearable dual-camera prototype that detects whether any visible person is speaking toward the wearer. Translated the product goal into real-time calibration, face analysis, speaking detection, and multi-view fusion under low-power device constraints.
Focus
- Product requirement to perception formulation
- Real-time stereo calibration and fusion
- Low-power wearable deployment constraints
Ultra-Low-Power Embedded Detection Models
Developed compact embedded models for presence classification and object detection on a highly constrained low-power platform. Combined hardware-aware architecture design, profiling, quantization, and distillation to meet strict latency, memory, and power budgets.
Focus
- TinyML-style model delivery
- Hardware-aware model architecture refinement
- Quantization, distillation, and profiling
Event-Based Motion Estimation and Tracking
Advanced a pre-research effort from low-level optical flow optimization to lightweight motion estimation redesign and multimodal tracking. Combined event-derived motion cues with RGB tracking to improve temporal association quality.
Focus
- Event-based optical flow optimization
- Lightweight motion-estimation redesign
- Multimodal tracking with RGB plus event motion cues
Event-Assisted Stereo Frame Interpolation
Designed a multi-stage pipeline for reconstructing high-frame-rate high-resolution RGB video from stereo RGB, grayscale, and event inputs. Combined geometry, motion estimation, and interpolation to outperform a strong RGB-only baseline in internal evaluation.
Focus
- Multi-rate stereo and event-based input fusion
- Geometry-aware warp inference
- High-frame-rate RGB reconstruction