AI Website Parser — Tasks

Implementation progress and upcoming work

Split website-scraper into Content Fetcher + AI Extractor services

Create database migrations for Phase 1/2 split (0285, 0286)

Extract shared post-extraction validation to post_extraction.py

Create Dockerfile.extractor with AI-only dependencies

Add discovered_pages table for tracking all internal URLs

Split dashboard sidebar into Website Scraper + AI Parser sections

Create Pipeline View page showing job flow through stages

Create Fetch Results page for Phase 1 regex data

Scale up AI extractor replicas and monitor performance

Add AI extraction duration tracking (fetched_at → completed_at)

Implement extraction quality scoring per provider

Add webhook notifications for extraction failures