Build a competitor analysis agent
Most competitive intelligence is done by hand: someone opens a competitor's pricing page, reads their blog, skims the news, and writes up notes that are stale within a week. This guide builds a Python agent that does the whole loop automatically — find the pages, scrape them clean, pull recent news, synthesize across sources, and hand an LLM the raw material to write a structured report. It's the first guide that chains four Superhighway endpoints into one production pipeline.
1. What you'll build
A single command — python competitor_analysis.py "Exa" "exa.ai" — that produces a structured competitive intelligence report covering:
- Pricing — plan names, prices, and what each tier includes
- Recent announcements and product changes
- Key messaging and positioning — how they describe themselves
- Potential weaknesses or gaps — where they appear silent or thin
- Recent news — funding, launches, partnerships
The pipeline uses four Superhighway endpoints, each doing the part it's best at: /search finds the right pages, /scrape turns each into clean Markdown, /news pulls recent coverage, and /research synthesizes across many sources. An LLM stitches it all into the final report.
2. Setup
pip install openai requests python-dotenv
Create a .env file with your two keys:
SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
3. Find the competitor's key pages
Start with /search. A few targeted queries — using the site: operator to stay on the competitor's own domain — surface the pricing page, changelog, and feature pages. We dedupe by URL so the same page from two queries only gets scraped once.
import requests, os
SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"
def find_competitor_pages(company: str, domain: str) -> list[dict]:
"""Find pricing, about, blog, and changelog pages."""
queries = [
f"site:{domain} pricing plans",
f"site:{domain} changelog OR releases",
f"{company} product features 2025",
]
pages = []
seen_urls = set()
for q in queries:
r = requests.get(
f"{BASE}/search",
params={"q": q, "limit": 3},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
for result in r.json().get("results", []):
if result["url"] not in seen_urls:
seen_urls.add(result["url"])
pages.append(result)
return pages
4. Scrape each page to Markdown
Search results give you titles and snippets, but a real pricing analysis needs the full page. /scrape returns the page as clean Markdown — no nav, no scripts, no cookie banners — so the LLM sees only the content that matters. We truncate to keep each page inside the model's context budget.
def scrape_page(url: str) -> str:
r = requests.get(
f"{BASE}/scrape",
params={"url": url},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
return r.json().get("markdown", "")[:3000] # Truncate to fit context
5. Get recent news about the competitor
Pricing and feature pages tell you the present state; /news tells you what's moving. Funding rounds, product launches, and partnership announcements are exactly the signals a competitive report should flag.
def get_competitor_news(company: str) -> list[dict]:
r = requests.get(
f"{BASE}/news",
params={"q": f"{company} product announcement funding", "count": 5},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
return r.json().get("articles", [])
6. Deep research synthesis with /research
The competitor's own pages are inherently one-sided. /research reads across many independent sources — reviews, comparisons, third-party write-ups — and returns a synthesized view. This is where you catch the things a company won't say about itself.
def research_competitor(company: str) -> str:
r = requests.get(
f"{BASE}/research",
params={"q": f"{company} product pricing features competitors", "pages": 5},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
data = r.json()
# /research returns synthesized markdown
return data.get("synthesis", data.get("markdown", ""))[:4000]
7. Generate the competitive intelligence report
Now the LLM. Everything above is retrieval; this step is synthesis. We hand the model the scraped pages, the news headlines, and the research synthesis, and ask it to write a fixed-structure report. The system prompt pins it to the sources — "based only on the provided sources" — so it doesn't hallucinate pricing or features.
from openai import OpenAI
llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def generate_report(
company: str,
pages: list[dict],
page_contents: dict[str, str],
news: list[dict],
research: str
) -> str:
# Build context
pages_text = "\n\n".join(
f"### {p['title']} ({p['url']})\n{page_contents.get(p['url'], p.get('content', ''))[:1000]}"
for p in pages[:4]
)
news_text = "\n".join(
f"- {a['title']} ({a.get('source', '')}): {a.get('description', '')}"
for a in news[:5]
)
response = llm.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a competitive intelligence analyst. Write structured, factual reports based only on the provided sources."},
{"role": "user", "content": f"""Analyze {company} as a competitor based on these sources:
## Web Pages
{pages_text}
## Recent News
{news_text}
## Research Synthesis
{research}
Write a competitive intelligence report with these sections:
1. **Pricing** — plans, prices, what's included
2. **Key Features** — main product capabilities
3. **Messaging & Positioning** — how they describe themselves
4. **Recent Changes** — new features, announcements, pricing changes
5. **Potential Gaps** — areas where they appear weak or silent
6. **Summary** — 2-3 sentence executive summary
"""}
]
)
return response.choices[0].message.content
8. The full pipeline
Wire the steps together. The orchestrator finds pages, scrapes the top four, pulls news, runs deep research, and feeds everything to the report generator.
def analyze_competitor(company: str, domain: str) -> str:
print(f"Analyzing {company}...")
# Step 1: Find key pages
pages = find_competitor_pages(company, domain)
print(f"Found {len(pages)} pages")
# Step 2: Scrape each page
page_contents = {}
for page in pages[:4]: # Limit to 4 pages
page_contents[page["url"]] = scrape_page(page["url"])
# Step 3: Get recent news
news = get_competitor_news(company)
# Step 4: Deep research
research = research_competitor(company)
# Step 5: Generate report
report = generate_report(company, pages, page_contents, news, research)
return report
if __name__ == "__main__":
import sys
company = sys.argv[1] if len(sys.argv) > 1 else "Exa"
domain = sys.argv[2] if len(sys.argv) > 2 else "exa.ai"
report = analyze_competitor(company, domain)
print(report)
# Save to file
filename = f"{company.lower().replace(' ', '_')}_analysis.md"
with open(filename, "w") as f:
f.write(f"# Competitive Analysis: {company}\n\n")
f.write(report)
print(f"\nSaved to {filename}")
9. Running it
Pass a company name and domain on the command line:
# Analyze a SaaS pricing competitor
python competitor_analysis.py "Firecrawl" "firecrawl.dev"
# Analyze a broader player
python competitor_analysis.py "Brave Search" "search.brave.com"
# Run on multiple competitors and compare
for company_domain in "Exa exa.ai" "Tavily tavily.com"; do
company=$(echo $company_domain | cut -d' ' -f1)
domain=$(echo $company_domain | cut -d' ' -f2)
python competitor_analysis.py "$company" "$domain"
done
10. Extending the agent
The single-run version is the foundation. From here, the high-value additions are:
- Schedule it — run weekly with cron and you have a standing competitive feed instead of a one-off snapshot
- Detect and alert on changes — diff this week's report against last week's and fire a notification when pricing or features move (pair it with the web change detection guide)
- Store reports with timestamps — keep them in a database so you can track how a competitor's positioning drifts over months
- Capture landing pages — add Superhighway's
/imagesendpoint to snapshot competitor pages visually alongside the text - Build a comparison matrix — run across your whole competitive set and lay the reports side by side
11. Why four endpoints instead of one
You could try to do this with a single search call and a big prompt, but each endpoint earns its place:
- /search finds the canonical pages — you don't have to guess the URL of a competitor's changelog.
- /scrape gives the LLM clean Markdown instead of raw HTML, so pricing tables and feature lists survive intact and the context isn't wasted on markup.
- /news surfaces time-sensitive signals — a launch or a funding round — that static pages won't tell you.
- /research brings in the outside view, the third-party comparisons and reviews that reveal the gaps a company stays quiet about.
Together they give the LLM both the inside and outside picture, which is what separates a useful competitive report from a rephrased homepage.
12. Getting your API key
Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.
From here, the search-and-read guide goes deeper on combining /search and /scrape, and the web change detection guide shows how to turn this into a scheduled monitor that alerts you the moment a competitor changes.