Build a job board scraping agent

Superhighway guides

Job hunting is a search-and-read problem at scale: you run the same queries every day, open dozens of postings, skim each one for fit, and try to remember which companies are worth a closer look. This guide builds a Python agent that does the legwork. Give it a profile — role, skills, location, salary range, must-haves, deal-breakers — and it chains /search (find matching postings), /scrape (pull the full job description), and /research (company background), then hands everything to an LLM that scores fit from 1-10, extracts the key requirements, flags deal-breakers, and returns a ranked shortlist with reasons to apply. Run it on a schedule and it becomes a daily job alert tuned to you.

1. What you'll build

A Python agent that takes a job search profile and produces a ranked list of opportunities. It:

2. Setup

pip install openai requests python-dotenv

Create a .env file with your two keys:

SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

3. Search for job postings

Build a query from the profile and hit /search. Folding in the top few skills and the location keeps results relevant without over-constraining the search.

import requests, os, json

SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"

def search_jobs(
    role: str,
    location: str = "",
    skills: list[str] | None = None,
    limit: int = 10
) -> list[dict]:
    """Search for job postings matching a profile."""
    skills_str = " ".join(skills[:3]) if skills else ""
    query = f"{role} job {skills_str} {location}".strip()

    r = requests.get(
        f"{BASE}/search",
        params={"q": query, "limit": limit},
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("results", [])

4. Scrape the full job description

Search snippets are too thin to judge fit — you need the full requirements, responsibilities, and benefits. /scrape returns each posting as clean, LLM-ready Markdown with the nav, ads, and cookie banners stripped out. Truncate it so the prompt stays small.

def scrape_job(url: str) -> dict:
    """Scrape a job posting for the full description."""
    r = requests.get(
        f"{BASE}/scrape",
        params={"url": url},
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    return {
        "url": url,
        "title": data.get("title", ""),
        "content": data.get("markdown", "")[:3000],
    }

5. Research the company

A good fit score weighs more than the job description — culture, team size, and reputation matter. One /research call does a multi-source sweep and returns a synthesized summary, so the LLM can factor company context into its recommendation.

def research_company(company_name: str) -> str:
    """Get background on the company offering the role."""
    r = requests.get(
        f"{BASE}/research",
        params={
            "q": f"{company_name} company culture engineering team size",
            "pages": 3
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    return data.get("synthesis", data.get("markdown", ""))[:2000]

6. Score fit with an LLM

Hand the scraped description and company research to the LLM along with the candidate profile. Asking for JSON output (with response_format) gives you a structured result you can sort and filter — a fit score, the top requirements, a deal-breaker flag, and an apply recommendation.

from openai import OpenAI

llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def score_job_fit(job: dict, company_research: str, profile: dict) -> dict | None:
    """Score a job posting against the candidate profile."""
    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": f"""You are a career advisor. Evaluate job fit based on this candidate profile:

Role target: {profile.get("role", "any")}
Skills: {", ".join(profile.get("skills", []))}
Experience level: {profile.get("experience_level", "any")}
Preferred location: {profile.get("location", "any")}
Salary range: {profile.get("salary_range", "not specified")}
Must-haves: {", ".join(profile.get("must_haves", []))}
Deal-breakers: {", ".join(profile.get("deal_breakers", []))}

Score fit from 1-10. Extract the top 3 requirements. Identify any deal-breakers."""
            },
            {
                "role": "user",
                "content": f"""Job: {job["title"]}
URL: {job["url"]}

Job Description:
{job["content"]}

Company Research:
{company_research}

Return JSON with: fit_score (1-10), fit_reason (string), top_requirements (list of 3 strings), deal_breaker_found (bool), deal_breaker_detail (string or null), apply_recommendation (string: "strong yes" | "yes" | "maybe" | "no")"""
            }
        ],
        response_format={"type": "json_object"}
    )
    try:
        result = json.loads(response.choices[0].message.content)
        result["url"] = job["url"]
        result["title"] = job["title"]
        return result
    except (json.JSONDecodeError, KeyError):
        return None

7. The full job search pipeline

Now wire the steps together: search, then for each posting scrape the description, research the company, and score fit. Drop anything below your threshold or flagged with a deal-breaker, and sort what's left by score.

def find_matching_jobs(
    profile: dict,
    min_fit_score: int = 6,
    max_jobs: int = 15
) -> list[dict]:
    """
    profile: {
        "role": "Senior Python Engineer",
        "skills": ["Python", "FastAPI", "PostgreSQL"],
        "experience_level": "senior",
        "location": "Remote",
        "salary_range": "$150k-$200k",
        "must_haves": ["remote", "Python"],
        "deal_breakers": ["on-site required", "no equity"]
    }
    """
    print(f"Searching for: {profile['role']}")

    # Step 1: Find job postings
    results = search_jobs(
        profile["role"],
        location=profile.get("location", ""),
        skills=profile.get("skills", []),
        limit=max_jobs
    )
    print(f"Found {len(results)} postings")

    scored_jobs = []
    for result in results:
        url = result["url"]
        print(f"  Processing: {result.get('title', url)[:60]}")

        # Step 2: Scrape full JD
        job = scrape_job(url)
        if not job["content"]:
            continue

        # Extract company name from title/url for research
        company_hint = result.get("title", "").split(" at ")[-1].split(" - ")[0]

        # Step 3: Company research (only for promising postings)
        company_research = research_company(company_hint) if company_hint else ""

        # Step 4: Score fit
        scored = score_job_fit(job, company_research, profile)
        if scored and not scored.get("deal_breaker_found") and scored.get("fit_score", 0) >= min_fit_score:
            scored_jobs.append(scored)
            print(f"    Qualified: {scored.get('fit_score')}/10 — {scored.get('apply_recommendation')}")
        elif scored and scored.get("deal_breaker_found"):
            print(f"    Skipped: deal-breaker — {scored.get('deal_breaker_detail', '')[:50]}")
        else:
            print(f"    Low fit score, skipped")

    # Sort by fit score descending
    return sorted(scored_jobs, key=lambda x: x.get("fit_score", 0), reverse=True)

def print_results(jobs: list[dict]):
    if not jobs:
        print("\nNo matching jobs found above threshold.")
        return
    print(f"\n{'='*60}")
    print(f"Found {len(jobs)} qualified matches:")
    for i, job in enumerate(jobs, 1):
        print(f"\n{i}. {job.get('title', 'Unknown')} — Score: {job.get('fit_score')}/10")
        print(f"   {job.get('url')}")
        print(f"   Recommendation: {job.get('apply_recommendation')}")
        print(f"   Why: {job.get('fit_reason', '')[:120]}")
        reqs = job.get("top_requirements", [])
        if reqs:
            print(f"   Key requirements: {', '.join(reqs[:3])}")

if __name__ == "__main__":
    MY_PROFILE = {
        "role": "Senior Python Engineer",
        "skills": ["Python", "FastAPI", "PostgreSQL", "Docker"],
        "experience_level": "senior (5+ years)",
        "location": "Remote",
        "salary_range": "$150k-$200k",
        "must_haves": ["remote", "Python"],
        "deal_breakers": ["on-site only", "no remote option"]
    }

    jobs = find_matching_jobs(MY_PROFILE, min_fit_score=6, max_jobs=15)
    print_results(jobs)

Each run prints a ranked shortlist: title, fit score, recommendation, the reason, and the top requirements pulled from the description — enough to decide which postings deserve a real application.

8. Schedule daily job alerts

The agent is most useful on autopilot. Drop it in a GitHub Actions workflow and it sweeps the boards every weekday morning, so a fresh shortlist is waiting before you start your day.

# .github/workflows/job-search.yml
name: Daily Job Search
on:
  schedule:
    - cron: '0 8 * * 1-5'  # Weekdays at 8 AM UTC
  workflow_dispatch:

jobs:
  search:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - run: pip install openai requests python-dotenv
      - run: python job_search.py
        env:
          SUPERHIGHWAY_API_KEY: ${{ secrets.SUPERHIGHWAY_API_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

9. Extending the agent

10. Getting your API key

Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.

For related builds, the lead generation agent applies the same search-scrape-score pattern to sales prospecting, and the search-and-read guide goes deeper on combining search with scraping.