AI Website Parser — Tasks
Implementation progress and upcoming work
Split website-scraper into Content Fetcher + AI Extractor services
Create database migrations for Phase 1/2 split (0285, 0286)
Extract shared post-extraction validation to post_extraction.py
Create Dockerfile.extractor with AI-only dependencies
Add discovered_pages table for tracking all internal URLs
Split dashboard sidebar into Website Scraper + AI Parser sections
Create Pipeline View page showing job flow through stages
Create Fetch Results page for Phase 1 regex data
Scale up AI extractor replicas and monitor performance
Add AI extraction duration tracking (fetched_at → completed_at)
Implement extraction quality scoring per provider
Add webhook notifications for extraction failures