Gemini 2.5 Native Audio: real-time audio-to-audio with interrupt detection, no STT/TTS pipeline
Computer vision: face authentication (MediaPipe Face Landmarker), gesture-controlled UI (pinch, grab, release)
Parametric CAD generation from voice prompts (build123d to STL) with iterative refinement
3D print pipeline: OrcaSlicer slicing, Moonraker/OctoPrint wireless job submission, auto printer discovery
Web agent: autonomous browser automation with Playwright for research, shopping, form filling
Smart home control: TP-Link Kasa device management (lights, switches, color/brightness) via voice
Project memory: persistent file-based context switching across sessions and workflows
Extensible tool architecture with user-confirmation guardrails for sensitive actions
Planned: robotics hardware control (ROS2 bridge), AI-powered smart glasses, localized on-device models across personal devices