Phone Assistant Architecture
AI voice training pipeline using CosyVoice on vast.ai GPU instances
Training Pipeline
Training Jobs
GPU Training
Orchestrates voice model training on vast.ai RTX 3090/4090 instances. Handles job queuing, monitoring, and completion.
Datasets
Training Data
Audio samples with transcriptions. Supports Thorsten dataset and custom client recordings.
Models
CosyVoice Models
Fine-tuned CosyVoice v3 models for each voice profile. Supports German language with Frozen-Encoder SFT method.
Evaluation
Quality Testing
A/B comparison of voice outputs, similarity scoring, and quality metrics via W&B dashboard.
Training Flow
Audio SamplesDataset Prepvast.ai GPUTrainingVoice Model