Sales Intelligence

Website Scraper Architecture

AI-powered extraction of employees, social profiles, and contact info from company websites

Internal Components

Mini Map

Service Configuration

Replicas5
VPNGluetun (Frankfurt)
Network Modeservice:vpn
AIOpenRouter / Gemini / Solar

Database Outputs

websitesCompany metadata
employeesContact persons
social_profilesSocial media links
tracked_pagesPages for change detection

Pipeline Position

Upstream

Result Processor (polls every 10s)

Downstream

Social Collectors, Change Detector

Component Breakdown

Page Fetcher

Fetches main page + all internal links. User agent rotation with 10 variants.

AI Extractor

Multi-provider: OpenRouter (Qwen3-235B) → Gemini → Solar. 500-char minimum content threshold.

Employee Extractor

Extracts contacts with roles, emails, phone. Anti-testimonial filter for "von/bei [company]" patterns.

Social Link Extractor

Detects and cleans social media URLs via regex. Supports 10+ platforms.

Email Classifier

Generic emails (info@, kontakt@) → company_contact. Personal emails → employee records.

Database Writer

Writes to websites, employees, social_profiles. Handles deduplication and tracked page registration.

Dashboard Pages