Sports Analytics Research Agent
Sports is a data-rich field where a single injury, a lineup change, or a tactical adjustment can reset the analytical picture overnight — and where a fair evaluation of a team or player turns on advanced metrics, historical comparisons, and the most recent roster and form data. This guide builds a Python agent for sports analysts, team front offices, sports journalists, fantasy sports operators, scouting departments, and sports media. It chains all four Superhighway endpoints — /research for the historical and tactical landscape, /search against authoritative stats databases and recent analysis, /scrape for a specific team or player stats page, and /news for injuries and roster moves — then uses an LLM to emit a structured sports brief as JSON.
Data note: Sports statistics and injury reports change rapidly. For the most current data, verify directly with official league sources — NFL.com, NBA.com, MLB.com, ESPN, and the league's official stats portal — and the Sports Reference sites.
Overview
The agent takes a team, player, or analytical topic — "Manchester City Premier League season analysis 2024", "Shohei Ohtani Dodgers 2024 season performance" — and produces a structured sports research brief:
- Synthesizes the landscape: historical context, team/player career arc, tactical and strategic analysis, and the league landscape
- Searches authoritative sports statistics databases — Sports Reference (Baseball-Reference, Pro-Football-Reference, Basketball-Reference), FBref (soccer), Hockey-Reference — for historical and advanced metrics
- Searches recent analysis — scouting reports, transfer news, contract situations, coaching decisions, and tactical breakdowns
- Scrapes one relevant page: a team stats page on Sports Reference, an FBref player profile, or a detailed analytics article
- Pulls recent news: injury reports, lineup changes, roster moves (trades, signings, releases), coaching changes, and recent results
- Uses an LLM to generate a structured brief — performance summary, advanced metrics, historical context, roster, tactics, transfer/contract context, and media narrative as JSON
Who it's for: sports analysts, team front offices, sports journalists, fantasy sports operators, scouting departments, and sports media.
How it works
Five endpoint calls feed one LLM synthesis:
/research— deep synthesis: historical context, team/player career arc, tactical and strategic analysis, and the league landscape./search(authoritative stats) — historical and advanced metrics scoped to Sports Reference, FBref, and Basketball-Reference./search(recent analysis,time=month) — scouting reports, transfer news, contract situations, coaching decisions, and tactical breakdowns./scrape— one relevant URL, e.g. a team stats page on sports-reference.com, an FBref player profile, or a detailed analytics article./news(time=week) — injury reports, lineup changes, roster moves, coaching changes, and recent match/game results.
Full example
pip install openai requests python-dotenv
Create a .env file with your two keys:
SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
import requests, os, json
from openai import OpenAI
SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"
HEADERS = {"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
NOTE = (
"Sports statistics and injury reports change rapidly. For the most current "
"data, verify directly with official league sources (NFL.com, NBA.com, "
"MLB.com, ESPN, the league's official stats portal) and Sports Reference "
"sites. This tool is for research, journalism, and fan analysis — not for "
"gambling predictions."
)
# 1. Deep synthesis of the historical & tactical landscape
def research_landscape(query: str) -> str:
"""Historical context, team/player career arc, tactics, league landscape."""
r = requests.get(
f"{BASE}/research",
params={"q": f"{query} sports analytics statistics performance"},
headers=HEADERS,
)
data = r.json()
return data.get("summary", "")[:3000]
# 2. Authoritative stats: Sports Reference, FBref, Basketball-Reference
def search_stats(query: str) -> list[dict]:
"""Historical and advanced metrics from Sports Reference and FBref."""
r = requests.get(
f"{BASE}/search",
params={
"q": f"{query} statistics advanced metrics "
f"site:sports-reference.com OR site:fbref.com "
f"OR site:basketball-reference.com",
},
headers=HEADERS,
)
return r.json().get("results", [])
# 3. Recent analysis: scouting, transfers, contracts (last month)
def search_analysis(query: str) -> list[dict]:
"""Scouting reports, transfer news, contracts, coaching, tactical breakdowns."""
r = requests.get(
f"{BASE}/search",
params={
"q": f"{query} sports analysis scouting report "
f"transfer market contract",
"time": "month",
},
headers=HEADERS,
)
return r.json().get("results", [])
# 4. Scrape one relevant stats page / player profile / analytics article
def scrape_page(url: str) -> dict:
"""Pull a Sports Reference team page, an FBref player profile, or an article."""
r = requests.post(
f"{BASE}/scrape",
json={"url": url, "mode": "markdown"},
headers=HEADERS,
)
data = r.json()
return {
"url": url,
"title": data.get("title", ""),
"content": data.get("markdown", data.get("text", ""))[:2500],
}
# 5. Recent sports news: injuries, lineups, roster moves (last week)
def get_news(query: str) -> list[dict]:
"""Injury reports, lineup changes, roster moves, coaching changes, results."""
r = requests.get(
f"{BASE}/news",
params={
"q": f"{query} sports injury roster lineup",
"time": "week",
},
headers=HEADERS,
)
return r.json().get("results", [])
def generate_brief(
query: str,
landscape: str,
stats: list[dict],
analysis: list[dict],
scraped: dict | None,
news: list[dict],
) -> dict | None:
"""Generate a structured sports research brief as JSON."""
stats_text = "\n".join(
f"- {r.get('title', '')}: {r.get('snippet', '')} ({r.get('url', '')})"
for r in stats[:6]
)
analysis_text = "\n".join(
f"- {r.get('title', '')}: {r.get('snippet', '')}"
for r in analysis[:6]
)
news_text = "\n".join(
f"- {n.get('title', '')}: {n.get('snippet', '')}"
for n in news[:6]
)
scraped_text = ""
if scraped and scraped.get("content"):
scraped_text = f"{scraped['title']}\n{scraped['content']}"
response = llm.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": (
"You are a sports analyst and researcher. Use ONLY the provided "
"sources. Do not invent statistics, advanced metrics, contract "
"figures, injury details, or player names — if a detail is not in "
"the sources, say 'not found in sources.' Be precise about "
"advanced metrics and explain what they mean. This is for "
"research, journalism, and fan analysis — never frame output "
"toward gambling or betting predictions."
),
},
{
"role": "user",
"content": f"""Write a sports research brief for: {query}
Landscape (synthesis):
{landscape}
Authoritative Stats (Sports Reference / FBref / Basketball-Reference):
{stats_text}
Recent Analysis (scouting / transfers / contracts):
{analysis_text}
Scraped Stats Page / Player Profile / Analytics Article:
{scraped_text}
Recent Sports News:
{news_text}
Return JSON with ALL of these fields:
- subject: team, player, or analytical topic being researched
- sport: "soccer" | "american-football" | "basketball" | "baseball" | "ice-hockey" | "tennis" | "golf" | "rugby" | "cricket" | "mixed"
- league_or_competition: specific league, tournament, or competition (e.g., EPL, NFL, NBA, MLB, Champions League)
- performance_summary: current form and recent performance — last 5-10 games/matches, key statistical trends
- advanced_metrics: sport-appropriate advanced analytics — xG/xGA for soccer; WAR for baseball; PER/TS% for basketball; EPA/DVOA for American football; Corsi/Fenwick for hockey; with context on what they mean
- historical_context: career trajectory, historical comparisons, franchise/club history relevant to the query
- roster_and_personnel: key players, injury status, depth chart, recent roster moves, coaching staff context
- tactical_and_strategic_notes: formation/scheme, coaching philosophy, matchup considerations, tendencies vs. specific opponents
- transfer_and_contract_context: contract status, transfer rumors, salary cap implications (NFL/NBA), free agency timeline — if relevant
- media_and_public_narrative: prevailing media storylines, fan sentiment, key controversies or debates
- data_sources: array of sources used (e.g., "sports-reference.com", "fbref.com", and any others)
- data_quality: "high" | "medium" | "low" — based on coverage from Sports Reference/FBref and recent news""",
},
],
response_format={"type": "json_object"},
)
try:
brief = json.loads(response.choices[0].message.content)
brief["note"] = NOTE
return brief
except (json.JSONDecodeError, KeyError):
return None
def research_sports(query: str) -> dict | None:
"""Run the full sports research pipeline."""
print(f"Researching: {query}")
print("Synthesizing landscape...")
landscape = research_landscape(query)
print("Searching authoritative stats...")
stats = search_stats(query)
print("Searching recent analysis...")
analysis = search_analysis(query)
print("Scraping a relevant stats page / profile / article...")
scraped = None
for result in stats + analysis:
url = result.get("url")
if url and ("sports-reference.com" in url or "fbref.com" in url
or "basketball-reference.com" in url):
scraped = scrape_page(url)
if scraped.get("content"):
break
print("Pulling recent sports news...")
news = get_news(query)
print("Generating sports brief...")
return generate_brief(query, landscape, stats, analysis, scraped, news)
def print_brief(brief: dict):
if not brief:
print("Could not generate brief.")
return
print(f"\n{'='*60}")
print(f"Sports Research Brief")
print(f"{'='*60}")
print(f"\nSubject: {brief.get('subject', '')}")
print(f"Sport: {brief.get('sport', '')}")
print(f"League/Competition: {brief.get('league_or_competition', '')}")
print(f"\nPerformance Summary:\n{brief.get('performance_summary', '')}")
print(f"\nAdvanced Metrics:\n{brief.get('advanced_metrics', '')}")
print(f"\nHistorical Context:\n{brief.get('historical_context', '')}")
print(f"\nRoster & Personnel:\n{brief.get('roster_and_personnel', '')}")
print(f"\nTactical & Strategic Notes:\n{brief.get('tactical_and_strategic_notes', '')}")
print(f"\nTransfer & Contract Context:\n{brief.get('transfer_and_contract_context', '')}")
print(f"\nMedia & Public Narrative:\n{brief.get('media_and_public_narrative', '')}")
print(f"\nData Sources: {', '.join(brief.get('data_sources', []))}")
print(f"\nData Quality: {brief.get('data_quality', '?')}")
print(f"\n{brief.get('note', '')}")
if __name__ == "__main__":
import sys
query = sys.argv[1] if len(sys.argv) > 1 else "Manchester City Premier League season analysis 2024"
brief = research_sports(query)
print_brief(brief)
Usage examples
- "Manchester City Premier League season analysis 2024" — maps squad depth and xG/xGA trends from FBref, traces Pep Guardiola's tactical adjustments, summarizes transfer activity, and positions City in the title race.
- "Shohei Ohtani Dodgers 2024 season performance" — pulls batting stats versus league leaders from Baseball-Reference, frames WAR context and historical two-way player comparisons, notes injury history, and explains the contract's impact on team payroll.
- "NFL quarterback evaluation 2024 draft class" — surfaces advanced metrics (EPA, completion percentage over expectation), draws college-to-pro historical comps, and weighs team needs against draft capital.
Getting your API key
Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.
See also
The competitor analysis agent uses the same four-endpoint pattern to profile rival organizations, and the brand monitoring agent applies the same search-and-news pattern to tracking mentions, sentiment, and narrative.