Build a talent market research agent

Superhighway guides

Before you open a headcount, post a job, or set a salary band, someone has to answer a stack of expensive questions: how much does this role pay right now, what skills do candidates actually have, where do they live, is demand growing or cooling, and how hard will it be to find them. That intel usually comes from LinkedIn Talent Insights, Gartner, or a comp consultant — pricey and slow. This guide builds a Python agent that assembles a first-pass talent market brief automatically. It chains all four Superhighway endpoints — /research for the talent landscape, /search for salary surveys and hiring reports, /scrape for compensation data off job boards and HR publications, and /news for layoffs, hiring surges, and salary shifts — then uses an LLM to emit a structured talent brief as JSON with demand, compensation ranges, in-demand skills, and sourcing strategies.

1. What you'll build

A Python agent that takes a role, skill, or talent market topic and produces a structured talent market brief:

Synthesizes the talent landscape: demand, supply, compensation norms, and geographic distribution
Finds job postings, salary surveys, and hiring trend reports
Scrapes job boards and HR publications for current compensation and skill data
Pulls recent talent market news: layoffs, hiring waves, salary changes, and skill shortages
Uses an LLM to generate a structured talent market brief as JSON

2. Setup

pip install openai requests python-dotenv

Create a .env file with your two keys:

SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

3. Research the talent landscape

Start with /research, which pulls multi-source background into one synthesis — demand trends, compensation norms, geographic hubs, and the growth trajectory for the role or skill. This is the context that grounds every later step.

import requests, os, json

SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"

def research_talent_landscape(role_or_skill: str) -> str:
    """Deep talent research: demand, compensation, geographic hubs, growth."""
    r = requests.get(
        f"{BASE}/research",
        params={
            "q": f"{role_or_skill} job market demand salary hiring trends remote",
            "pages": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    return data.get("synthesis", data.get("markdown", ""))[:3000]

4. Find job data and compensation benchmarks

Two /search calls. The first hunts for salary surveys and compensation benchmarks; the second hunts for hiring trends, demand signals, and talent shortage coverage — the market-intel layer that tells you whether this role is getting harder or easier to fill.

def find_compensation_data(role_or_skill: str) -> list[dict]:
    """Find salary surveys and compensation benchmarks."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{role_or_skill} salary range 2024 2025 compensation benchmark survey",
            "limit": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("results", [])

def find_market_intel(role_or_skill: str) -> list[dict]:
    """Find hiring trends, demand signals, and talent shortage reports."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{role_or_skill} job market demand hiring trends talent shortage remote",
            "limit": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("results", [])

5. Scrape job boards and HR publications

/scrape turns each candidate URL into clean, LLM-ready markdown. This is where the agent pulls actual numbers — salary ranges, required skills, experience levels — out of job postings, salary survey pages, and HR publications.

def scrape_talent_page(url: str) -> dict:
    """Scrape a job board, salary survey, or HR publication page."""
    r = requests.get(
        f"{BASE}/scrape",
        params={"url": url},
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    return {
        "url": url,
        "title": data.get("title", ""),
        "content": data.get("markdown", "")[:2500],
    }

6. Get recent talent market news

/news surfaces what just happened — layoffs, hiring surges, salary inflation, skill shortages, and remote-work shifts. This is the time-sensitive layer that a static salary survey misses.

def get_talent_news(role_or_skill: str) -> list[dict]:
    """Get recent layoffs, hiring surges, salary changes, and shortages."""
    r = requests.get(
        f"{BASE}/news",
        params={
            "q": f"{role_or_skill} hiring layoffs salary shortage remote work tech",
            "count": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("articles", [])

7. Generate the talent brief with an LLM

Now hand everything to the LLM. The system prompt forbids inventing figures and is explicit that salary and demand data are approximate — the agent summarizes what's in the sources. The output is structured JSON so it slots straight into a compensation review, a sourcing plan, or a headcount budget.

from openai import OpenAI

llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def generate_brief(
    role_or_skill: str,
    landscape: str,
    talent_pages: list[dict],
    news: list[dict],
    context: dict | None = None
) -> dict | None:
    """Generate a structured talent market brief."""
    context = context or {}
    level = context.get("level", "mid")
    industry = context.get("industry", "general")
    region = context.get("region", "global")

    pages_text = "\n".join(
        f"- {p['title']}: {p['content'][:400]}"
        for p in talent_pages[:5]
        if p.get("content")
    )

    news_text = "\n".join(
        f"- {n.get('title', '')} ({n.get('source', '')})"
        for n in news[:6]
    )

    note = (
        "Salary and demand data are approximate based on publicly available "
        "information. Verify with current job postings and compensation surveys "
        "before making hiring decisions."
    )

    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": f"""You are a talent market analyst writing a first-pass hiring brief.
Seniority: {level}. Industry: {industry}. Region: {region}.
Be specific and factual. Only use information from the provided sources.
Do not invent salary figures — if a number isn't in the sources, give a range and flag it as approximate.
Salary and demand data are approximate and not a substitute for a formal compensation survey."""
            },
            {
                "role": "user",
                "content": f"""Write a talent market brief for: {role_or_skill}.

Talent Landscape & Demand Context:
{landscape[:2000]}

Job Postings, Salary Surveys & Reports:
{pages_text}

Recent Talent Market News:
{news_text}

Return JSON with:
- role_or_skill: string (the role, skill, or market researched)
- talent_demand: "high" | "medium" | "low"
- compensation_range: string (typical range, e.g. "$120k–$180k for senior IC in US tech")
- years_experience_typical: string (what the market typically requires)
- key_skills_in_demand: list of 5-7 strings (skills that appear most in postings)
- geographic_hubs: list of 3-4 strings (top locations for this role + remote availability)
- hiring_trends: string (is demand growing, shrinking, or shifting? what's driving it?)
- talent_pool_notes: string (is supply tight or loose? why? international talent? bootcamp grads?)
- recent_market_moves: list of 3 strings (from news — layoffs, hiring surges, salary changes)
- recommended_sourcing_strategies: list of 3-4 strings (where to find candidates, what to emphasize in JDs)
- data_confidence: "high" | "medium" | "low" (based on data availability)
- note: "{note}""""
            }
        ],
        response_format={"type": "json_object"}
    )

    try:
        brief = json.loads(response.choices[0].message.content)
        brief["note"] = note
        return brief
    except (json.JSONDecodeError, KeyError):
        return None

8. The full research pipeline

Wire the steps together: research the landscape, find compensation and market data, scrape the top pages, pull news, then generate the brief. The context argument lets you narrow by seniority, industry, and region.

def research_talent_market(
    role_or_skill: str,
    context: dict | None = None,
    max_pages: int = 5
) -> dict | None:
    """
    Run the full talent market research pipeline.

    context = {
        "level": "junior" | "mid" | "senior" | "staff" | "executive",
        "industry": "tech" | "finance" | "healthcare" | "manufacturing" | "general",
        "region": "US" | "EU" | "APAC" | "global",
    }
    """
    print(f"Researching talent market: {role_or_skill}")

    # Step 1: Talent landscape
    print("Researching talent landscape...")
    landscape = research_talent_landscape(role_or_skill)

    # Step 2: Find compensation data and market intel
    print("Finding compensation benchmarks and market intel...")
    results = find_compensation_data(role_or_skill) + find_market_intel(role_or_skill)

    # Dedupe by URL
    seen, candidates = set(), []
    for r in results:
        url = r.get("url")
        if url and url not in seen:
            seen.add(url)
            candidates.append(r)

    # Step 3: Scrape top pages
    print(f"Scraping {min(len(candidates), max_pages)} talent pages...")
    talent_pages = []
    for result in candidates[:max_pages]:
        page = scrape_talent_page(result["url"])
        if page["content"]:
            talent_pages.append(page)

    # Step 4: Recent news
    print("Pulling talent market news...")
    news = get_talent_news(role_or_skill)

    # Step 5: Generate brief
    print("Generating talent brief...")
    return generate_brief(role_or_skill, landscape, talent_pages, news, context)

def print_brief(brief: dict):
    if not brief:
        print("Could not generate brief.")
        return

    print(f"\n{'='*60}")
    print(f"Talent Brief — {brief.get('role_or_skill', 'Role')}")
    print(f"Demand: {brief.get('talent_demand', '?').upper()}  |  "
          f"Confidence: {brief.get('data_confidence', '?').upper()}")
    print(f"{'='*60}")

    print(f"\nCompensation Range:\n  {brief.get('compensation_range', '')}")
    print(f"\nTypical Experience:\n  {brief.get('years_experience_typical', '')}")

    print("\nKey Skills in Demand:")
    for skill in brief.get("key_skills_in_demand", []):
        print(f"  - {skill}")

    print("\nGeographic Hubs:")
    for hub in brief.get("geographic_hubs", []):
        print(f"  - {hub}")

    print(f"\nHiring Trends:\n  {brief.get('hiring_trends', '')}")
    print(f"\nTalent Pool Notes:\n  {brief.get('talent_pool_notes', '')}")

    moves = brief.get("recent_market_moves", [])
    if moves:
        print("\nRecent Market Moves:")
        for m in moves:
            print(f"  * {m}")

    strategies = brief.get("recommended_sourcing_strategies", [])
    if strategies:
        print("\nRecommended Sourcing Strategies:")
        for s in strategies:
            print(f"  + {s}")

    print(f"\n{brief.get('note', '')}")

if __name__ == "__main__":
    import sys

    # Usage: python agent.py "senior machine learning engineer" senior tech US
    if len(sys.argv) >= 2:
        role_arg = sys.argv[1]
        ctx = {
            "level": sys.argv[2] if len(sys.argv) > 2 else "mid",
            "industry": sys.argv[3] if len(sys.argv) > 3 else "general",
            "region": sys.argv[4] if len(sys.argv) > 4 else "global",
        }
    else:
        role_arg = "senior machine learning engineer"
        ctx = {"level": "senior", "industry": "tech", "region": "US"}

    brief = research_talent_market(role_arg, context=ctx)
    if brief:
        print_brief(brief)

9. What you can research

Role research — "senior machine learning engineer", "DevOps engineer", "product manager fintech", "data scientist healthcare".
Skill demand — "Rust programming", "LLM fine-tuning", "Kubernetes", "TypeScript React".
Market dynamics — "software engineer market 2025", "AI researcher hiring surge", "tech layoffs 2025".
Compensation benchmarks — "staff engineer salary FAANG", "startup equity compensation norms".
Remote / geo — "senior engineer remote salary premium", "engineering talent Berlin vs London".

10. Use cases

Recruiter — quickly benchmark comp and understand skill expectations before sourcing a role.
HR manager — run an annual compensation review: what are peers paying for your key roles?
Startup founder — before opening a headcount, understand what it'll cost and where to source.
VC — evaluate talent density before investing in a sector ("are there enough ML engineers for 50 AI startups?").
People ops — annual compensation planning: which roles are seeing salary pressure?

11. Extending the agent

Compensation tracker — run monthly for your 10 most-hired roles and diff compensation_range to catch market movement.
JD optimizer — feed key_skills_in_demand back into job-description drafting so you include what candidates actually search for.
Headcount planning — run for each planned hire and sum compensation ranges for budget forecasting.
Competitive sourcing — scrape /news?q={role}+hiring+{competitor}+team to track competitor hiring signals.

12. A note on data quality

Talent market data from public sources is approximate. Salary ranges vary significantly by company size, location, and individual experience. This agent aggregates publicly available information — job boards, salary surveys, and industry reports — so always validate with current postings and HR compensation tools before making hiring decisions.

13. Getting your API key

Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.

For related builds, the financial research agent uses the same four-endpoint pattern on companies and markets, and the supply chain research agent applies it to suppliers and logistics risk.