Build a competitor analysis agent

Superhighway guides

Most competitive intelligence is done by hand: someone opens a competitor's pricing page, reads their blog, skims the news, and writes up notes that are stale within a week. This guide builds a Python agent that does the whole loop automatically — find the pages, scrape them clean, pull recent news, synthesize across sources, and hand an LLM the raw material to write a structured report. It's the first guide that chains four Superhighway endpoints into one production pipeline.

1. What you'll build

A single command — python competitor_analysis.py "Exa" "exa.ai" — that produces a structured competitive intelligence report covering:

Pricing — plan names, prices, and what each tier includes
Recent announcements and product changes
Key messaging and positioning — how they describe themselves
Potential weaknesses or gaps — where they appear silent or thin
Recent news — funding, launches, partnerships

The pipeline uses four Superhighway endpoints, each doing the part it's best at: /search finds the right pages, /scrape turns each into clean Markdown, /news pulls recent coverage, and /research synthesizes across many sources. An LLM stitches it all into the final report.

2. Setup

pip install openai requests python-dotenv

Create a .env file with your two keys:

SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

3. Find the competitor's key pages

Start with /search. A few targeted queries — using the site: operator to stay on the competitor's own domain — surface the pricing page, changelog, and feature pages. We dedupe by URL so the same page from two queries only gets scraped once.

import requests, os

SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"

def find_competitor_pages(company: str, domain: str) -> list[dict]:
    """Find pricing, about, blog, and changelog pages."""
    queries = [
        f"site:{domain} pricing plans",
        f"site:{domain} changelog OR releases",
        f"{company} product features 2025",
    ]
    pages = []
    seen_urls = set()

    for q in queries:
        r = requests.get(
            f"{BASE}/search",
            params={"q": q, "limit": 3},
            headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
        )
        for result in r.json().get("results", []):
            if result["url"] not in seen_urls:
                seen_urls.add(result["url"])
                pages.append(result)

    return pages

4. Scrape each page to Markdown

Search results give you titles and snippets, but a real pricing analysis needs the full page. /scrape returns the page as clean Markdown — no nav, no scripts, no cookie banners — so the LLM sees only the content that matters. We truncate to keep each page inside the model's context budget.

def scrape_page(url: str) -> str:
    r = requests.get(
        f"{BASE}/scrape",
        params={"url": url},
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("markdown", "")[:3000]  # Truncate to fit context

5. Get recent news about the competitor

Pricing and feature pages tell you the present state; /news tells you what's moving. Funding rounds, product launches, and partnership announcements are exactly the signals a competitive report should flag.

def get_competitor_news(company: str) -> list[dict]:
    r = requests.get(
        f"{BASE}/news",
        params={"q": f"{company} product announcement funding", "count": 5},
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("articles", [])

6. Deep research synthesis with /research

The competitor's own pages are inherently one-sided. /research reads across many independent sources — reviews, comparisons, third-party write-ups — and returns a synthesized view. This is where you catch the things a company won't say about itself.

def research_competitor(company: str) -> str:
    r = requests.get(
        f"{BASE}/research",
        params={"q": f"{company} product pricing features competitors", "pages": 5},
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    # /research returns synthesized markdown
    return data.get("synthesis", data.get("markdown", ""))[:4000]

7. Generate the competitive intelligence report

Now the LLM. Everything above is retrieval; this step is synthesis. We hand the model the scraped pages, the news headlines, and the research synthesis, and ask it to write a fixed-structure report. The system prompt pins it to the sources — "based only on the provided sources" — so it doesn't hallucinate pricing or features.

from openai import OpenAI

llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def generate_report(
    company: str,
    pages: list[dict],
    page_contents: dict[str, str],
    news: list[dict],
    research: str
) -> str:
    # Build context
    pages_text = "\n\n".join(
        f"### {p['title']} ({p['url']})\n{page_contents.get(p['url'], p.get('content', ''))[:1000]}"
        for p in pages[:4]
    )
    news_text = "\n".join(
        f"- {a['title']} ({a.get('source', '')}): {a.get('description', '')}"
        for a in news[:5]
    )

    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a competitive intelligence analyst. Write structured, factual reports based only on the provided sources."},
            {"role": "user", "content": f"""Analyze {company} as a competitor based on these sources:

## Web Pages
{pages_text}

## Recent News
{news_text}

## Research Synthesis
{research}

Write a competitive intelligence report with these sections:
1. **Pricing** — plans, prices, what's included
2. **Key Features** — main product capabilities
3. **Messaging & Positioning** — how they describe themselves
4. **Recent Changes** — new features, announcements, pricing changes
5. **Potential Gaps** — areas where they appear weak or silent
6. **Summary** — 2-3 sentence executive summary
"""}
        ]
    )
    return response.choices[0].message.content

8. The full pipeline

Wire the steps together. The orchestrator finds pages, scrapes the top four, pulls news, runs deep research, and feeds everything to the report generator.

def analyze_competitor(company: str, domain: str) -> str:
    print(f"Analyzing {company}...")

    # Step 1: Find key pages
    pages = find_competitor_pages(company, domain)
    print(f"Found {len(pages)} pages")

    # Step 2: Scrape each page
    page_contents = {}
    for page in pages[:4]:  # Limit to 4 pages
        page_contents[page["url"]] = scrape_page(page["url"])

    # Step 3: Get recent news
    news = get_competitor_news(company)

    # Step 4: Deep research
    research = research_competitor(company)

    # Step 5: Generate report
    report = generate_report(company, pages, page_contents, news, research)

    return report

if __name__ == "__main__":
    import sys
    company = sys.argv[1] if len(sys.argv) > 1 else "Exa"
    domain = sys.argv[2] if len(sys.argv) > 2 else "exa.ai"

    report = analyze_competitor(company, domain)
    print(report)

    # Save to file
    filename = f"{company.lower().replace(' ', '_')}_analysis.md"
    with open(filename, "w") as f:
        f.write(f"# Competitive Analysis: {company}\n\n")
        f.write(report)
    print(f"\nSaved to {filename}")

9. Running it

Pass a company name and domain on the command line:

# Analyze a SaaS pricing competitor
python competitor_analysis.py "Firecrawl" "firecrawl.dev"

# Analyze a broader player
python competitor_analysis.py "Brave Search" "search.brave.com"

# Run on multiple competitors and compare
for company_domain in "Exa exa.ai" "Tavily tavily.com"; do
    company=$(echo $company_domain | cut -d' ' -f1)
    domain=$(echo $company_domain | cut -d' ' -f2)
    python competitor_analysis.py "$company" "$domain"
done

10. Extending the agent

The single-run version is the foundation. From here, the high-value additions are:

Schedule it — run weekly with cron and you have a standing competitive feed instead of a one-off snapshot
Detect and alert on changes — diff this week's report against last week's and fire a notification when pricing or features move (pair it with the web change detection guide)
Store reports with timestamps — keep them in a database so you can track how a competitor's positioning drifts over months
Capture landing pages — add Superhighway's /images endpoint to snapshot competitor pages visually alongside the text
Build a comparison matrix — run across your whole competitive set and lay the reports side by side

11. Why four endpoints instead of one

You could try to do this with a single search call and a big prompt, but each endpoint earns its place:

/search finds the canonical pages — you don't have to guess the URL of a competitor's changelog.
/scrape gives the LLM clean Markdown instead of raw HTML, so pricing tables and feature lists survive intact and the context isn't wasted on markup.
/news surfaces time-sensitive signals — a launch or a funding round — that static pages won't tell you.
/research brings in the outside view, the third-party comparisons and reviews that reveal the gaps a company stays quiet about.

Together they give the LLM both the inside and outside picture, which is what separates a useful competitive report from a rephrased homepage.

12. Getting your API key

Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.

From here, the search-and-read guide goes deeper on combining /search and /scrape, and the web change detection guide shows how to turn this into a scheduled monitor that alerts you the moment a competitor changes.