Sports Analytics Research Agent

Superhighway guides

Sports is a data-rich field where a single injury, a lineup change, or a tactical adjustment can reset the analytical picture overnight — and where a fair evaluation of a team or player turns on advanced metrics, historical comparisons, and the most recent roster and form data. This guide builds a Python agent for sports analysts, team front offices, sports journalists, fantasy sports operators, scouting departments, and sports media. It chains all four Superhighway endpoints — /research for the historical and tactical landscape, /search against authoritative stats databases and recent analysis, /scrape for a specific team or player stats page, and /news for injuries and roster moves — then uses an LLM to emit a structured sports brief as JSON.

Data note: Sports statistics and injury reports change rapidly. For the most current data, verify directly with official league sources — NFL.com, NBA.com, MLB.com, ESPN, and the league's official stats portal — and the Sports Reference sites.

Overview

The agent takes a team, player, or analytical topic — "Manchester City Premier League season analysis 2024", "Shohei Ohtani Dodgers 2024 season performance" — and produces a structured sports research brief:

Synthesizes the landscape: historical context, team/player career arc, tactical and strategic analysis, and the league landscape
Searches authoritative sports statistics databases — Sports Reference (Baseball-Reference, Pro-Football-Reference, Basketball-Reference), FBref (soccer), Hockey-Reference — for historical and advanced metrics
Searches recent analysis — scouting reports, transfer news, contract situations, coaching decisions, and tactical breakdowns
Scrapes one relevant page: a team stats page on Sports Reference, an FBref player profile, or a detailed analytics article
Pulls recent news: injury reports, lineup changes, roster moves (trades, signings, releases), coaching changes, and recent results
Uses an LLM to generate a structured brief — performance summary, advanced metrics, historical context, roster, tactics, transfer/contract context, and media narrative as JSON

Who it's for: sports analysts, team front offices, sports journalists, fantasy sports operators, scouting departments, and sports media.

How it works

Five endpoint calls feed one LLM synthesis:

/research — deep synthesis: historical context, team/player career arc, tactical and strategic analysis, and the league landscape.
/search (authoritative stats) — historical and advanced metrics scoped to Sports Reference, FBref, and Basketball-Reference.
/search (recent analysis, time=month) — scouting reports, transfer news, contract situations, coaching decisions, and tactical breakdowns.
/scrape — one relevant URL, e.g. a team stats page on sports-reference.com, an FBref player profile, or a detailed analytics article.
/news (time=week) — injury reports, lineup changes, roster moves, coaching changes, and recent match/game results.

Full example

pip install openai requests python-dotenv

Create a .env file with your two keys:

SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

import requests, os, json
from openai import OpenAI

SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"
HEADERS = {"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}

llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

NOTE = (
    "Sports statistics and injury reports change rapidly. For the most current "
    "data, verify directly with official league sources (NFL.com, NBA.com, "
    "MLB.com, ESPN, the league's official stats portal) and Sports Reference "
    "sites. This tool is for research, journalism, and fan analysis — not for "
    "gambling predictions."
)

# 1. Deep synthesis of the historical & tactical landscape
def research_landscape(query: str) -> str:
    """Historical context, team/player career arc, tactics, league landscape."""
    r = requests.get(
        f"{BASE}/research",
        params={"q": f"{query} sports analytics statistics performance"},
        headers=HEADERS,
    )
    data = r.json()
    return data.get("summary", "")[:3000]

# 2. Authoritative stats: Sports Reference, FBref, Basketball-Reference
def search_stats(query: str) -> list[dict]:
    """Historical and advanced metrics from Sports Reference and FBref."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{query} statistics advanced metrics "
                 f"site:sports-reference.com OR site:fbref.com "
                 f"OR site:basketball-reference.com",
        },
        headers=HEADERS,
    )
    return r.json().get("results", [])

# 3. Recent analysis: scouting, transfers, contracts (last month)
def search_analysis(query: str) -> list[dict]:
    """Scouting reports, transfer news, contracts, coaching, tactical breakdowns."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{query} sports analysis scouting report "
                 f"transfer market contract",
            "time": "month",
        },
        headers=HEADERS,
    )
    return r.json().get("results", [])

# 4. Scrape one relevant stats page / player profile / analytics article
def scrape_page(url: str) -> dict:
    """Pull a Sports Reference team page, an FBref player profile, or an article."""
    r = requests.post(
        f"{BASE}/scrape",
        json={"url": url, "mode": "markdown"},
        headers=HEADERS,
    )
    data = r.json()
    return {
        "url": url,
        "title": data.get("title", ""),
        "content": data.get("markdown", data.get("text", ""))[:2500],
    }

# 5. Recent sports news: injuries, lineups, roster moves (last week)
def get_news(query: str) -> list[dict]:
    """Injury reports, lineup changes, roster moves, coaching changes, results."""
    r = requests.get(
        f"{BASE}/news",
        params={
            "q": f"{query} sports injury roster lineup",
            "time": "week",
        },
        headers=HEADERS,
    )
    return r.json().get("results", [])

def generate_brief(
    query: str,
    landscape: str,
    stats: list[dict],
    analysis: list[dict],
    scraped: dict | None,
    news: list[dict],
) -> dict | None:
    """Generate a structured sports research brief as JSON."""

    stats_text = "\n".join(
        f"- {r.get('title', '')}: {r.get('snippet', '')} ({r.get('url', '')})"
        for r in stats[:6]
    )
    analysis_text = "\n".join(
        f"- {r.get('title', '')}: {r.get('snippet', '')}"
        for r in analysis[:6]
    )
    news_text = "\n".join(
        f"- {n.get('title', '')}: {n.get('snippet', '')}"
        for n in news[:6]
    )
    scraped_text = ""
    if scraped and scraped.get("content"):
        scraped_text = f"{scraped['title']}\n{scraped['content']}"

    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a sports analyst and researcher. Use ONLY the provided "
                    "sources. Do not invent statistics, advanced metrics, contract "
                    "figures, injury details, or player names — if a detail is not in "
                    "the sources, say 'not found in sources.' Be precise about "
                    "advanced metrics and explain what they mean. This is for "
                    "research, journalism, and fan analysis — never frame output "
                    "toward gambling or betting predictions."
                ),
            },
            {
                "role": "user",
                "content": f"""Write a sports research brief for: {query}

Landscape (synthesis):
{landscape}

Authoritative Stats (Sports Reference / FBref / Basketball-Reference):
{stats_text}

Recent Analysis (scouting / transfers / contracts):
{analysis_text}

Scraped Stats Page / Player Profile / Analytics Article:
{scraped_text}

Recent Sports News:
{news_text}

Return JSON with ALL of these fields:
- subject: team, player, or analytical topic being researched
- sport: "soccer" | "american-football" | "basketball" | "baseball" | "ice-hockey" | "tennis" | "golf" | "rugby" | "cricket" | "mixed"
- league_or_competition: specific league, tournament, or competition (e.g., EPL, NFL, NBA, MLB, Champions League)
- performance_summary: current form and recent performance — last 5-10 games/matches, key statistical trends
- advanced_metrics: sport-appropriate advanced analytics — xG/xGA for soccer; WAR for baseball; PER/TS% for basketball; EPA/DVOA for American football; Corsi/Fenwick for hockey; with context on what they mean
- historical_context: career trajectory, historical comparisons, franchise/club history relevant to the query
- roster_and_personnel: key players, injury status, depth chart, recent roster moves, coaching staff context
- tactical_and_strategic_notes: formation/scheme, coaching philosophy, matchup considerations, tendencies vs. specific opponents
- transfer_and_contract_context: contract status, transfer rumors, salary cap implications (NFL/NBA), free agency timeline — if relevant
- media_and_public_narrative: prevailing media storylines, fan sentiment, key controversies or debates
- data_sources: array of sources used (e.g., "sports-reference.com", "fbref.com", and any others)
- data_quality: "high" | "medium" | "low" — based on coverage from Sports Reference/FBref and recent news""",
            },
        ],
        response_format={"type": "json_object"},
    )

    try:
        brief = json.loads(response.choices[0].message.content)
        brief["note"] = NOTE
        return brief
    except (json.JSONDecodeError, KeyError):
        return None

def research_sports(query: str) -> dict | None:
    """Run the full sports research pipeline."""
    print(f"Researching: {query}")

    print("Synthesizing landscape...")
    landscape = research_landscape(query)

    print("Searching authoritative stats...")
    stats = search_stats(query)

    print("Searching recent analysis...")
    analysis = search_analysis(query)

    print("Scraping a relevant stats page / profile / article...")
    scraped = None
    for result in stats + analysis:
        url = result.get("url")
        if url and ("sports-reference.com" in url or "fbref.com" in url
                    or "basketball-reference.com" in url):
            scraped = scrape_page(url)
            if scraped.get("content"):
                break

    print("Pulling recent sports news...")
    news = get_news(query)

    print("Generating sports brief...")
    return generate_brief(query, landscape, stats, analysis, scraped, news)

def print_brief(brief: dict):
    if not brief:
        print("Could not generate brief.")
        return
    print(f"\n{'='*60}")
    print(f"Sports Research Brief")
    print(f"{'='*60}")
    print(f"\nSubject: {brief.get('subject', '')}")
    print(f"Sport: {brief.get('sport', '')}")
    print(f"League/Competition: {brief.get('league_or_competition', '')}")
    print(f"\nPerformance Summary:\n{brief.get('performance_summary', '')}")
    print(f"\nAdvanced Metrics:\n{brief.get('advanced_metrics', '')}")
    print(f"\nHistorical Context:\n{brief.get('historical_context', '')}")
    print(f"\nRoster & Personnel:\n{brief.get('roster_and_personnel', '')}")
    print(f"\nTactical & Strategic Notes:\n{brief.get('tactical_and_strategic_notes', '')}")
    print(f"\nTransfer & Contract Context:\n{brief.get('transfer_and_contract_context', '')}")
    print(f"\nMedia & Public Narrative:\n{brief.get('media_and_public_narrative', '')}")
    print(f"\nData Sources: {', '.join(brief.get('data_sources', []))}")
    print(f"\nData Quality: {brief.get('data_quality', '?')}")
    print(f"\n{brief.get('note', '')}")

if __name__ == "__main__":
    import sys
    query = sys.argv[1] if len(sys.argv) > 1 else "Manchester City Premier League season analysis 2024"
    brief = research_sports(query)
    print_brief(brief)

Usage examples

"Manchester City Premier League season analysis 2024" — maps squad depth and xG/xGA trends from FBref, traces Pep Guardiola's tactical adjustments, summarizes transfer activity, and positions City in the title race.
"Shohei Ohtani Dodgers 2024 season performance" — pulls batting stats versus league leaders from Baseball-Reference, frames WAR context and historical two-way player comparisons, notes injury history, and explains the contract's impact on team payroll.
"NFL quarterback evaluation 2024 draft class" — surfaces advanced metrics (EPA, completion percentage over expectation), draws college-to-pro historical comps, and weighs team needs against draft capital.

Getting your API key

Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.