Back to home
Selected projects.
Product and engineering work across mobile, AI, and full-stack systems.
ChatSprout 2.0 — LLM-Powered Scenario-Based English Communication Coach
- Conducted 10+ user interviews and identified the core pain point: non-native newcomers struggle not with grammar, but with context-specific expression in high-stakes scenarios (networking, disagreeing, giving feedback).
- Designed a scenario-based training system using LLM + RAG — built a scenario taxonomy from user research, with a low-similarity fallback mechanism to handle edge cases and maintain user trust.
- Optimized the retrieval pipeline to achieve ~80% Top-3 scenario match accuracy; established an evaluation framework (expression accuracy, tone, latency, token cost) and controlled the response time in 1.5–2s.
- Key insight: In open-ended language training, retrieval quality drives generation quality more than model scale.
ML Training Data Generation Pipeline (Android Malware Detection)
- Framed the core problem: security researchers don't just need “a dataset”, they need a reproducible, extensible capability to continuously generate high-quality training data, enabling iterative model improvement.
- Redesigned one-off data processing scripts into a reusable data generation pipeline supporting the full train to evaluate loop for malware detection models.
- Designed for diversity (17+ obfuscation strategies), controllability (dual-pipeline architecture with stage-level monitoring), and reliability (retry + fallback mechanisms, 82% success rate at scale).
- Processed 12,000+ Android malware samples and generated 7,900+ obfuscated variants; the system was reused by subsequent research teams, significantly reducing data preparation cycles.
Community Assistant
React, Node.js, PostgreSQL, Redis, Docker, AWS
- Built a containerized full-stack platform for community request and volunteer appointment management with JWT authentication and 18+ RESTful API endpoints.
- Implemented a Redis caching layer with 60s TTL and write-through invalidation to reduce redundant database queries and expose real-time cache metrics.
- Dockerized services with Docker Compose, deployed to AWS EC2, and automated testing and deployment with GitHub Actions CI/CD.