This repository features two cutting-edge Python projects that bridge the gap between Human-Computer Interaction (HCI) and Artificial Intelligence. It includes Ada, a real-time voice assistant powered by Gemini, and a Gesture-Based Controller for media management.
Ada is a sophisticated AI companion built using the Google Gemini 2.5 Flash Native Audio model. Unlike traditional assistants, Ada processes raw audio streams for near-instantaneous responses.
- Low-Latency Interaction: Utilizes
v1betalive connection for real-time dialogue. - VAD & Interruption Handling: Integrated Voice Activity Detection allows you to interrupt the AI naturally, just like a human conversation.
- Smart Context: Configured with a witty and helpful personality ("Ada") using the 'Kore' voice profile.
A computer vision application that allows you to control your Spotify playback and Windows volume using hand movements. Perfect for hands-free control while working or gaming.
| Gesture | Action |
|---|---|
| Pinch (Index + Thumb) | Dynamic Volume Adjustment (0% - 100%) |
| Fist (Closed Hand) | Mute Spotify and Lock Controls |
| German Three (3 Fingers) | Max Volume (100%) and Lock Controls |
| Full Palm (5 Fingers) | Unlock Controls |
| Thumb Up (Like) | Lock Current Volume Level |
- Core: Python 3.10+
- AI/ML: Google GenAI SDK (Gemini 2.5 Flash), MediaPipe
- Computer Vision: OpenCV
- Audio Processing: PyAudio, Pycaw (Windows Core Audio), Struct, Math
- Asynchronous I/O: Asyncio
- Clone the Repository:
git clone https://github.com/JustLmr/Ada cd ai-multimodal-workspace - Install Dependencies:
pip install -r requirements.txt
- Configure Environment Variables:
GEMINI_API_KEY=your_google_gemini_api_key
- Run the Application:
(Note: Ensure your microphone and camera are not being used by other applications.)
python main.py