AI Vtuber Creation Tool

Dashboard

AI Status: Idle

Twitch: Disconnected

Discord: Disconnected

VTube Software: Disconnected

OBS: Disconnected

Performance: N/A

Real-time Activity

Chat Messages

Waiting for chat messages...

AI Responses / Thoughts

Show Thoughts:

Waiting for AI activity...

Voice Activity: Idle

Live Transcription: ...

Screen Perception (Preview)

Screen preview updates periodically based on Vision Model settings.

Configuration

Personality Studio

Name: The name your AI Vtuber will use.

Background Story: Provide context for the AI's personality and knowledge.

Personality Traits: Keywords describing the core personality.

System Prompt / Core Instructions: The main instruction set for the LLM. Advanced users can use templates. Check documentation for variables.

Version: 1.0 | Manage Versions | Templating Help

Negative Prompt (Optional): Helps steer the AI away from undesirable outputs.

Example Dialogues (Optional): Few-shot examples to guide the AI's tone, style, and response format.

Model Selection & Configuration

Large Language Model (LLM)

LLM Provider/Type: Choose the source for the primary language model.

Model Name/ID: Specific model identifier for the selected provider/type. For local models, this might be a name (Ollama) or file path (Llama.cpp).

API Endpoint / Path (if applicable): Required for Ollama, Llama.cpp (if server running), or Custom API.

Advanced LLM parameters (temperature, top_p, repetition penalty, quantization settings, adapter paths) can be configured under 'Advanced Settings' or via the configuration file.

Vision Model (for Screen Perception)

Vision Provider/Type: Select the model for understanding screen content.

Vision Model Name/ID: Specific model identifier for the selected vision provider.

Voice & Emotion Settings

Text-to-Speech (TTS)

TTS Engine: Choose the engine to generate the AI's voice.

TTS Voice ID/Name/Path: Enter the specific voice identifier or path to a local voice model file. Voice selection dropdown will be populated based on the chosen engine where possible (Future Feature).

Speed: 1.0

Pitch: 1.0

Emotion Mapping Profile: Configure how AI emotions translate to TTS prosody and avatar expressions (Future Feature).

Speech-to-Text (STT)

STT Engine: Choose the engine to transcribe your voice input.

STT Model Name/Path (if applicable): Identifier or path for the selected STT engine/model.

Language (if applicable): Specify language code (like 'en', 'ja') or 'auto' if supported by the engine/model.

Enable Voice Activity Detection (VAD): Helps detect when speech starts and stops to reduce processing and latency. Recommended.

Integration Settings

Twitch

Twitch Channel Name:

Status: Disconnected

Discord

Server ID:

Channel ID (for listening/speaking):

Status: Disconnected

Vtubing Software

Software:

WebSocket IP Address:

WebSocket Port: Default: VTube Studio=8001, Warudo=19190. Adjust if changed.

Status: Disconnected

OBS Studio

OBS WebSocket IP Address:

OBS WebSocket Port: Requires obs-websocket plugin v5.x or later installed in OBS.

OBS WebSocket Password:

Status: Disconnected

Alert Platforms (Webhooks)

Streamlabs Webhook URL (Optional): Allows AI to react to follows, subs, etc. Requires setup in Streamlabs.

StreamElements Webhook URL (Optional): Allows AI to react to events. Requires setup in StreamElements.

API Keys & Authentication

Enter API keys required by selected models or integrations. Keys are stored securely using the OS credential manager (e.g., Windows Credential Manager, macOS Keychain) and are not saved directly in configuration files. Fields will appear blank after saving.

OpenAI API Key:

Anthropic API Key:

Google Gemini API Key:

Mistral AI API Key:

ElevenLabs API Key:

Deepgram API Key:

Twitch Bot OAuth Token: Required for sending messages. Generate via sites like twitchapps.com/tmi/ (ensure 'chat:read' and 'chat:edit' scopes are included). Use with caution.

Discord Bot Token: Create a bot application in the Discord Developer Portal.

Note: Keys for external tools (Web Search, Weather) might be configured under the 'Actions' tab if required by the tool.

Memory System (Long-Term)

Memory System: Determines how the AI remembers past interactions long-term.

Vector Database Settings (if enabled)

Vector DB Type:

Vector DB Path / URL: Path for local DB (will be created if it doesn't exist) or URL for remote.

Embedding Model: Model used to create embeddings for memory storage/retrieval (local via Ollama/SentenceTransformers or API). Affects quality and resource usage.

Chunking Strategy: How text is divided before embedding (Future Feature).

Relevant Memories to Retrieve (k): Max number of past interactions/facts to fetch for context (RAG).

Enable Re-ranking (MMR): Use Maximal Marginal Relevance (MMR) or similar technique to diversify retrieved memories, improving relevance.

Memory Summarization Prompt (Optional): Helps condense information and manage database size. Requires an LLM call. Leave blank to disable.

Action & Tool Configuration

Enable and configure actions the AI can perform beyond speaking. Available actions depend on enabled integrations and plugins. Permissions should be reviewed carefully.

Available Actions:

Loading available actions... (Checking integrations and plugins)

More actions will appear here as they are developed or added via plugins. Configuration and permission settings for actions are planned.

Screen Perception Setup

Configure how the AI perceives and understands content on your screen. Requires a Vision Model to be selected in the 'Models' tab.

Screen Capture Frequency (seconds): How often to capture the screen for analysis (higher frequency uses more resources).

Capture Region: Choose which part of the screen the AI should see (Future Feature).

Target Applications (Optional): Limit perception to specific applications (Future Feature).

OCR / Detection Model Config: Advanced configuration for the vision model (e.g., OCR language hints) (Future Feature).

Advanced Settings

Optimization Preference: Influences internal trade-offs between speed, resource usage, and response quality/complexity (e.g., model choices, queue priorities).

Application Log Level: Controls the verbosity of logs shown in the 'Application Logs' section below and saved to file.

Show AI Thought Process (in Real-time View): Display intermediate steps like retrieved memory, chosen actions, or function calls in the AI Response view (if supported by the pipeline).

GPU Layers (for local models like Llama.cpp/Whisper.cpp): -1 for auto/max (uses available VRAM), 0 for CPU only. Adjust based on your GPU VRAM. Requires GPU support compiled/installed.

LLM Temperature: Controls randomness in LLM responses. Lower values = more deterministic, higher = more creative. (Overrides default if set).

LLM Top-P: Nucleus sampling parameter. Considers only the most probable tokens with cumulative probability P. (Overrides default if set).

More advanced settings may be available via direct editing of the configuration file (e.g., `config.json` or `config.yaml`). Refer to documentation.

Application Logs

Application logs will appear here...