AI Website Parser — Database Schema
Tables and columns populated by AI extraction (Phase 2)
website_scrape_jobs (AI Extraction Columns)
Phase 2 columns added during scraper split
| Column | Type | Description |
|---|---|---|
| combined_text | TEXT | Page content from Phase 1 (NULLed after extraction) |
| extract_retry_count | INTEGER | Number of AI extraction retry attempts |
| extract_error | TEXT | Last extraction error message |
| fetched_at | TIMESTAMPTZ | When Phase 1 fetching completed |
| status | TEXT | Job status: fetched, extract_processing, completed, extract_failed |
employees
People extracted by AI from company websites
| Column | Type | Description |
|---|---|---|
| id | BIGSERIAL PK | Unique identifier |
| website_id | BIGINT FK | Reference to websites table |
| first_name / last_name | TEXT | Person's name |
| job_title | TEXT | Position title |
| TEXT | Personal email address | |
| phone | TEXT | Phone number |
| seniority_level | TEXT | C-level, VP, Director, Manager, etc. |
| department | TEXT | Department classification |
social_profiles
Social media profiles linked to employees or companies
| Column | Type | Description |
|---|---|---|
| id | BIGSERIAL PK | Unique identifier |
| employee_id | BIGINT FK | Reference to employees (nullable) |
| website_id | BIGINT FK | Reference to websites |
| platform | TEXT | linkedin, xing, instagram, facebook, x, etc. |
| url | TEXT | Full profile URL |
| username | TEXT | Platform username/handle |
job_listings
Job openings extracted from career pages
| Column | Type | Description |
|---|---|---|
| id | BIGSERIAL PK | Unique identifier |
| website_id | BIGINT FK | Reference to websites |
| title | TEXT | Job title |
| department | TEXT | Department |
| location | TEXT | Job location |
| seniority | TEXT | Seniority level |
| url | TEXT | URL to the job posting |
websites (AI-populated fields)
Company metadata populated by AI extraction
| Column | Type | Description |
|---|---|---|
| company_name | TEXT | Extracted company name |
| company_description | TEXT | AI-generated description |
| industry | TEXT | Industry classification |
| estimated_revenue | TEXT | Revenue range estimate |
| headcount_range | TEXT | Employee count range |
| company_contact_email | TEXT | Generic contact email (info@, kontakt@) |
| ai_model | TEXT | Which AI model processed this website |
| status | TEXT | scraped = AI extraction complete |