My engineering work focuses on real-time computer vision systems, multimodal perception, and efficient deployment across NVIDIA GPU/CUDA, mobile, SoC, and MCU platforms.

Overview

  • System types: C++ SDKs, real-time demos, embedded perception pipelines, and validation platforms
  • Technical themes: multimodal perception, geometry-aware vision, on-device deployment, and inference acceleration
  • Deployment targets: NVIDIA GPU/CUDA platforms, mobile, SoC, and MCU

Multimodal Pose and Hand Tracking Demo

Designed lightweight multimodal models for pose and hand tracking from grayscale and event-based input, then built real-time PC demos to showcase the system. Extended model outputs with motion cues and implemented an asynchronous pipeline to improve tracking behavior and throughput.

Focus

  • Lightweight multimodal detection and tracking
  • Real-time demo system implementation
  • Tracking-oriented output design and asynchronous execution

Real-Time Gaze Tracking SDK

Built a real-time C++ SDK for face detection and gaze tracking with RGB and event-based input support. Integrated calibration, screen-space mapping, and runtime inference into a PC validation platform running at real-time speed.

Focus

  • RGB and event-based multimodal input
  • Real-time PC validation platform
  • C++ SDK interface and runtime integration

Mobile Multi-Frame Denoising SDK

Implemented a C++ SDK for mobile-oriented multi-frame image denoising from an existing Python reference. Designed a reusable tile-based computation abstraction to support a stable alignment-and-merge pipeline and future SIMD optimization.

Focus

  • Python-to-C++ translation for deployment
  • Reusable tile-based computation abstraction
  • Mobile-oriented SDK design

On-Device Speaker-Awareness Pipeline

Designed a deployable perception pipeline for a wearable dual-camera prototype that detects whether any visible person is speaking toward the wearer. Translated the product goal into real-time calibration, face analysis, speaking detection, and multi-view fusion under low-power device constraints.

Focus

  • Product requirement to perception formulation
  • Real-time stereo calibration and fusion
  • Low-power wearable deployment constraints

Ultra-Low-Power Embedded Detection Models

Developed compact embedded models for presence classification and object detection on a highly constrained low-power platform. Combined hardware-aware architecture design, profiling, quantization, and distillation to meet strict latency, memory, and power budgets.

Focus

  • TinyML-style model delivery
  • Hardware-aware model architecture refinement
  • Quantization, distillation, and profiling

Event-Based Motion Estimation and Tracking

Advanced a pre-research effort from low-level optical flow optimization to lightweight motion estimation redesign and multimodal tracking. Combined event-derived motion cues with RGB tracking to improve temporal association quality.

Focus

  • Event-based optical flow optimization
  • Lightweight motion-estimation redesign
  • Multimodal tracking with RGB plus event motion cues

Event-Assisted Stereo Frame Interpolation

Designed a multi-stage pipeline for reconstructing high-frame-rate high-resolution RGB video from stereo RGB, grayscale, and event inputs. Combined geometry, motion estimation, and interpolation to outperform a strong RGB-only baseline in internal evaluation.

Focus

  • Multi-rate stereo and event-based input fusion
  • Geometry-aware warp inference
  • High-frame-rate RGB reconstruction