Skip to content

JustLmr/Ada

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 AI Multimodal Workspace: Ada & Gesture Control

This repository features two cutting-edge Python projects that bridge the gap between Human-Computer Interaction (HCI) and Artificial Intelligence. It includes Ada, a real-time voice assistant powered by Gemini, and a Gesture-Based Controller for media management.


🎙️ 1. Ada: Real-Time Voice Assistant

Ada is a sophisticated AI companion built using the Google Gemini 2.5 Flash Native Audio model. Unlike traditional assistants, Ada processes raw audio streams for near-instantaneous responses.

Key Features

  • Low-Latency Interaction: Utilizes v1beta live connection for real-time dialogue.
  • VAD & Interruption Handling: Integrated Voice Activity Detection allows you to interrupt the AI naturally, just like a human conversation.
  • Smart Context: Configured with a witty and helpful personality ("Ada") using the 'Kore' voice profile.

🖐️ 2. Gesture-Based Spotify Controller

A computer vision application that allows you to control your Spotify playback and Windows volume using hand movements. Perfect for hands-free control while working or gaming.

Control Schema

Gesture Action
Pinch (Index + Thumb) Dynamic Volume Adjustment (0% - 100%)
Fist (Closed Hand) Mute Spotify and Lock Controls
German Three (3 Fingers) Max Volume (100%) and Lock Controls
Full Palm (5 Fingers) Unlock Controls
Thumb Up (Like) Lock Current Volume Level

🛠️ Technical Stack

  • Core: Python 3.10+
  • AI/ML: Google GenAI SDK (Gemini 2.5 Flash), MediaPipe
  • Computer Vision: OpenCV
  • Audio Processing: PyAudio, Pycaw (Windows Core Audio), Struct, Math
  • Asynchronous I/O: Asyncio

⚙️ Installation & Setup

  1. Clone the Repository:
    git clone https://github.com/JustLmr/Ada
    cd ai-multimodal-workspace
  2. Install Dependencies:
    pip install -r requirements.txt
  3. Configure Environment Variables:
    GEMINI_API_KEY=your_google_gemini_api_key
  4. Run the Application:
    python main.py
    (Note: Ensure your microphone and camera are not being used by other applications.)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages