中文 / EN
Projects
Back to home

Selected projects.

Product and engineering work across mobile, AI, and full-stack systems.

Project

ChatSprout 2.0 — LLM-Powered Scenario-Based English Communication Coach

Product Owner & Solo Developer · Nov 2025 - Mar 2026

  • Conducted 10+ user interviews and identified the core pain point: non-native newcomers struggle not with grammar, but with context-specific expression in high-stakes scenarios (networking, disagreeing, giving feedback).
  • Designed a scenario-based training system using LLM + RAG — built a scenario taxonomy from user research, with a low-similarity fallback mechanism to handle edge cases and maintain user trust.
  • Optimized the retrieval pipeline to achieve ~80% Top-3 scenario match accuracy; established an evaluation framework (expression accuracy, tone, latency, token cost) and controlled the response time in 1.5–2s.
  • Key insight: In open-ended language training, retrieval quality drives generation quality more than model scale.
Project

ML Training Data Generation Pipeline (Android Malware Detection)

System Design · 2025.09 – 2025.12

  • Framed the core problem: security researchers don't just need “a dataset”, they need a reproducible, extensible capability to continuously generate high-quality training data, enabling iterative model improvement.
  • Redesigned one-off data processing scripts into a reusable data generation pipeline supporting the full train to evaluate loop for malware detection models.
  • Designed for diversity (17+ obfuscation strategies), controllability (dual-pipeline architecture with stage-level monitoring), and reliability (retry + fallback mechanisms, 82% success rate at scale).
  • Processed 12,000+ Android malware samples and generated 7,900+ obfuscated variants; the system was reused by subsequent research teams, significantly reducing data preparation cycles.
Project

Community Assistant

React, Node.js, PostgreSQL, Redis, Docker, AWS

  • Built a containerized full-stack platform for community request and volunteer appointment management with JWT authentication and 18+ RESTful API endpoints.
  • Implemented a Redis caching layer with 60s TTL and write-through invalidation to reduce redundant database queries and expose real-time cache metrics.
  • Dockerized services with Docker Compose, deployed to AWS EC2, and automated testing and deployment with GitHub Actions CI/CD.