Build a job board scraping agent
Job hunting is a search-and-read problem at scale: you run the same queries every day, open dozens of postings, skim each one for fit, and try to remember which companies are worth a closer look. This guide builds a Python agent that does the legwork. Give it a profile — role, skills, location, salary range, must-haves, deal-breakers — and it chains /search (find matching postings), /scrape (pull the full job description), and /research (company background), then hands everything to an LLM that scores fit from 1-10, extracts the key requirements, flags deal-breakers, and returns a ranked shortlist with reasons to apply. Run it on a schedule and it becomes a daily job alert tuned to you.
1. What you'll build
A Python agent that takes a job search profile and produces a ranked list of opportunities. It:
- Takes a profile — role, skills, location, salary range, must-haves, deal-breakers
- Searches for matching job postings across the web (via
/search) - Scrapes each posting for the full job description (via
/scrape) - Pulls company background (via
/research) - Uses an LLM to score fit, extract key requirements, and flag deal-breakers
- Outputs a ranked list of opportunities with fit explanations
2. Setup
pip install openai requests python-dotenv
Create a .env file with your two keys:
SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
3. Search for job postings
Build a query from the profile and hit /search. Folding in the top few skills and the location keeps results relevant without over-constraining the search.
import requests, os, json
SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"
def search_jobs(
role: str,
location: str = "",
skills: list[str] | None = None,
limit: int = 10
) -> list[dict]:
"""Search for job postings matching a profile."""
skills_str = " ".join(skills[:3]) if skills else ""
query = f"{role} job {skills_str} {location}".strip()
r = requests.get(
f"{BASE}/search",
params={"q": query, "limit": limit},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
return r.json().get("results", [])
4. Scrape the full job description
Search snippets are too thin to judge fit — you need the full requirements, responsibilities, and benefits. /scrape returns each posting as clean, LLM-ready Markdown with the nav, ads, and cookie banners stripped out. Truncate it so the prompt stays small.
def scrape_job(url: str) -> dict:
"""Scrape a job posting for the full description."""
r = requests.get(
f"{BASE}/scrape",
params={"url": url},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
data = r.json()
return {
"url": url,
"title": data.get("title", ""),
"content": data.get("markdown", "")[:3000],
}
5. Research the company
A good fit score weighs more than the job description — culture, team size, and reputation matter. One /research call does a multi-source sweep and returns a synthesized summary, so the LLM can factor company context into its recommendation.
def research_company(company_name: str) -> str:
"""Get background on the company offering the role."""
r = requests.get(
f"{BASE}/research",
params={
"q": f"{company_name} company culture engineering team size",
"pages": 3
},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
data = r.json()
return data.get("synthesis", data.get("markdown", ""))[:2000]
6. Score fit with an LLM
Hand the scraped description and company research to the LLM along with the candidate profile. Asking for JSON output (with response_format) gives you a structured result you can sort and filter — a fit score, the top requirements, a deal-breaker flag, and an apply recommendation.
from openai import OpenAI
llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def score_job_fit(job: dict, company_research: str, profile: dict) -> dict | None:
"""Score a job posting against the candidate profile."""
response = llm.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": f"""You are a career advisor. Evaluate job fit based on this candidate profile:
Role target: {profile.get("role", "any")}
Skills: {", ".join(profile.get("skills", []))}
Experience level: {profile.get("experience_level", "any")}
Preferred location: {profile.get("location", "any")}
Salary range: {profile.get("salary_range", "not specified")}
Must-haves: {", ".join(profile.get("must_haves", []))}
Deal-breakers: {", ".join(profile.get("deal_breakers", []))}
Score fit from 1-10. Extract the top 3 requirements. Identify any deal-breakers."""
},
{
"role": "user",
"content": f"""Job: {job["title"]}
URL: {job["url"]}
Job Description:
{job["content"]}
Company Research:
{company_research}
Return JSON with: fit_score (1-10), fit_reason (string), top_requirements (list of 3 strings), deal_breaker_found (bool), deal_breaker_detail (string or null), apply_recommendation (string: "strong yes" | "yes" | "maybe" | "no")"""
}
],
response_format={"type": "json_object"}
)
try:
result = json.loads(response.choices[0].message.content)
result["url"] = job["url"]
result["title"] = job["title"]
return result
except (json.JSONDecodeError, KeyError):
return None
7. The full job search pipeline
Now wire the steps together: search, then for each posting scrape the description, research the company, and score fit. Drop anything below your threshold or flagged with a deal-breaker, and sort what's left by score.
def find_matching_jobs(
profile: dict,
min_fit_score: int = 6,
max_jobs: int = 15
) -> list[dict]:
"""
profile: {
"role": "Senior Python Engineer",
"skills": ["Python", "FastAPI", "PostgreSQL"],
"experience_level": "senior",
"location": "Remote",
"salary_range": "$150k-$200k",
"must_haves": ["remote", "Python"],
"deal_breakers": ["on-site required", "no equity"]
}
"""
print(f"Searching for: {profile['role']}")
# Step 1: Find job postings
results = search_jobs(
profile["role"],
location=profile.get("location", ""),
skills=profile.get("skills", []),
limit=max_jobs
)
print(f"Found {len(results)} postings")
scored_jobs = []
for result in results:
url = result["url"]
print(f" Processing: {result.get('title', url)[:60]}")
# Step 2: Scrape full JD
job = scrape_job(url)
if not job["content"]:
continue
# Extract company name from title/url for research
company_hint = result.get("title", "").split(" at ")[-1].split(" - ")[0]
# Step 3: Company research (only for promising postings)
company_research = research_company(company_hint) if company_hint else ""
# Step 4: Score fit
scored = score_job_fit(job, company_research, profile)
if scored and not scored.get("deal_breaker_found") and scored.get("fit_score", 0) >= min_fit_score:
scored_jobs.append(scored)
print(f" Qualified: {scored.get('fit_score')}/10 — {scored.get('apply_recommendation')}")
elif scored and scored.get("deal_breaker_found"):
print(f" Skipped: deal-breaker — {scored.get('deal_breaker_detail', '')[:50]}")
else:
print(f" Low fit score, skipped")
# Sort by fit score descending
return sorted(scored_jobs, key=lambda x: x.get("fit_score", 0), reverse=True)
def print_results(jobs: list[dict]):
if not jobs:
print("\nNo matching jobs found above threshold.")
return
print(f"\n{'='*60}")
print(f"Found {len(jobs)} qualified matches:")
for i, job in enumerate(jobs, 1):
print(f"\n{i}. {job.get('title', 'Unknown')} — Score: {job.get('fit_score')}/10")
print(f" {job.get('url')}")
print(f" Recommendation: {job.get('apply_recommendation')}")
print(f" Why: {job.get('fit_reason', '')[:120]}")
reqs = job.get("top_requirements", [])
if reqs:
print(f" Key requirements: {', '.join(reqs[:3])}")
if __name__ == "__main__":
MY_PROFILE = {
"role": "Senior Python Engineer",
"skills": ["Python", "FastAPI", "PostgreSQL", "Docker"],
"experience_level": "senior (5+ years)",
"location": "Remote",
"salary_range": "$150k-$200k",
"must_haves": ["remote", "Python"],
"deal_breakers": ["on-site only", "no remote option"]
}
jobs = find_matching_jobs(MY_PROFILE, min_fit_score=6, max_jobs=15)
print_results(jobs)
Each run prints a ranked shortlist: title, fit score, recommendation, the reason, and the top requirements pulled from the description — enough to decide which postings deserve a real application.
8. Schedule daily job alerts
The agent is most useful on autopilot. Drop it in a GitHub Actions workflow and it sweeps the boards every weekday morning, so a fresh shortlist is waiting before you start your day.
# .github/workflows/job-search.yml
name: Daily Job Search
on:
schedule:
- cron: '0 8 * * 1-5' # Weekdays at 8 AM UTC
workflow_dispatch:
jobs:
search:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: '3.11'
- run: pip install openai requests python-dotenv
- run: python job_search.py
env:
SUPERHIGHWAY_API_KEY: ${{ secrets.SUPERHIGHWAY_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
9. Extending the agent
- Multi-query sweep — run several search variations (
"Python engineer remote","backend engineer Python","software engineer FastAPI") and deduplicate by URL to widen coverage. - Email/Slack alerts — add a notification step that fires when new high-scoring jobs appear, so you only check in when there's something worth applying to.
- Application tracking — store results in a SQLite database and re-check the same URLs weekly to detect when postings close or change.
- Cover letter draft — after scoring, feed the job description and profile back to the LLM to generate a first-draft cover letter tailored to each role.
- Recruiter research — use
/searchto find the hiring manager or recruiter before applying, and personalize your outreach.
10. Getting your API key
Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.
For related builds, the lead generation agent applies the same search-scrape-score pattern to sales prospecting, and the search-and-read guide goes deeper on combining search with scraping.