Build a patent research agent

Superhighway guides

Patent intelligence — prior art search, freedom-to-operate (FTO) analysis, landscape mapping, competitor filing tracking — is some of the most expensive data in the innovation economy. Derwent, PatSnap, and law-firm retainers charge five and six figures a year for it. But most of the underlying signal is public: Google Patents, the USPTO, the EPO's Espacenet, academic publications, and standards documents are all free and searchable. This guide builds a Python agent that mines those public sources and assembles a structured patent intelligence brief. It chains all four Superhighway endpoints — /research for the patent landscape of a technology area, /search for specific patents and non-patent prior art, /scrape for claim language and citation networks, and /news for recent filings, litigation, and licensing deals — then uses an LLM to emit a structured brief as JSON.

This guide is for research and informational purposes only — it is not legal advice. It retrieves publicly available patent information. It does not constitute a freedom-to-operate opinion or a professional prior art search. Patent law is complex and jurisdiction-specific. Always consult a qualified patent attorney for FTO opinions, prior art searches, and any legal IP strategy decisions.

1. What you'll build

A Python agent that takes a technology, an invention description, or a company name and produces a structured patent intelligence brief:

2. Setup

pip install openai requests python-dotenv

Create a .env file with your two keys:

SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

3. Research the patent landscape

Start with /research, which pulls multi-source background into one synthesis — who holds the IP in this space, how filing activity is trending, where the key claims cluster, and when major patents expire. This is the grounding context for every later step.

import requests, os, json

SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"

def research_landscape(technology_or_invention: str) -> str:
    """Deep synthesis: key holders, filing trends, key claims, prior art, expiry."""
    r = requests.get(
        f"{BASE}/research",
        params={
            "q": f"{technology_or_invention} patent technology landscape key patents prior art",
            "pages": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    return data.get("synthesis", data.get("markdown", ""))[:3000]

4. Find patents and prior art

Two /search calls do the legwork. The first hunts for specific patents on Google Patents and Justia — claims, filing dates, assignees; the second goes after non-patent prior art: academic papers, IEEE/ACM publications, and standards documents. Prior art matters because published work that predates a patent can limit what was patentable — and what a competitor can enforce against you.

def find_patents(technology_or_invention: str) -> list[dict]:
    """Find specific patents on Google Patents, Justia, USPTO, EPO."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{technology_or_invention} patent site:patents.google.com OR site:patents.justia.com claims filing",
            "limit": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("results", [])

def find_prior_art(technology_or_invention: str) -> list[dict]:
    """Find non-patent prior art: academic papers, standards, specifications."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{technology_or_invention} prior art academic paper publication IEEE ACM standard specification",
            "limit": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("results", [])

5. Scrape patent pages

/scrape turns each candidate URL into clean, LLM-ready markdown. This is where the agent pulls actual claim language, filing dates, assignees, and the "cited by" citation network out of Google Patents entries, Justia pages, and patent office records.

def scrape_patent_page(url: str) -> dict:
    """Scrape a patent page for claims, filing dates, assignee, and citations."""
    r = requests.get(
        f"{BASE}/scrape",
        params={"url": url},
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    return {
        "url": url,
        "title": data.get("title", ""),
        "content": data.get("markdown", "")[:2500],
    }

6. Get recent patent news

/news surfaces what just happened — high-profile new filings, litigation outcomes, IPR (inter partes review) proceedings, licensing deals, and granted patents. This is the time-sensitive layer that a static patent-database scan misses.

def get_patent_news(technology_or_invention: str) -> list[dict]:
    """Get recent patent filings, litigation, IPR proceedings, and licensing deals."""
    r = requests.get(
        f"{BASE}/news",
        params={
            "q": f"{technology_or_invention} patent filing litigation lawsuit licensing deal IPR",
            "count": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("articles", [])

7. Generate the patent intelligence brief with an LLM

Now hand everything to the LLM. The system prompt forbids inventing patent numbers, assignees, or expiry dates and bars any legal conclusion — the agent summarizes what's in the sources, it does not advise. The output is structured JSON so it slots straight into a landscape memo, an FTO scoping document, or a dashboard.

from openai import OpenAI

llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

DISCLAIMER = (
    "This summary is for informational purposes only and does not constitute legal "
    "advice. Patent law is complex and jurisdiction-specific. Always consult a "
    "qualified patent attorney for freedom-to-operate opinions and IP strategy decisions."
)

def generate_brief(
    technology_or_invention: str,
    landscape: str,
    patent_pages: list[dict],
    prior_art: list[dict],
    news: list[dict],
    context: dict
) -> dict | None:
    """Generate a structured patent intelligence brief."""

    patent_text = "\n".join(
        f"- {p['title']}: {p['content'][:400]}"
        for p in patent_pages[:5]
        if p.get("content")
    )

    prior_art_text = "\n".join(
        f"- {r.get('title', '')}: {r.get('description', '')}"
        for r in prior_art[:5]
    )

    news_text = "\n".join(
        f"- {n.get('title', '')} ({n.get('source', '')})"
        for n in news[:6]
    )

    focus = context.get("focus", "landscape")
    jurisdiction = context.get("jurisdiction", "global")
    industry = context.get("industry", "general")

    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": f"""You are a patent analyst preparing a research brief.
The focus is a {focus} analysis for the {jurisdiction} jurisdiction in the {industry} industry.
Be specific and factual. Only use information from the provided sources.
Do not invent patent numbers, assignees, claim language, or expiry dates — if a detail
isn't in the sources, say 'not found in sources.' Do not make legal conclusions or give
freedom-to-operate opinions; you summarize public information, you do not advise."""
            },
            {
                "role": "user",
                "content": f"""Write a patent intelligence brief for: {technology_or_invention}

Patent Landscape:
{landscape[:2000]}

Patents Found (scraped):
{patent_text}

Non-Patent Prior Art:
{prior_art_text}

Recent Patent News:
{news_text}

Return JSON with:
- technology_area: string (the technology, invention, or company researched)
- key_patent_holders: list of 3-5 strings (major assignees, with rough portfolio size if mentioned)
- key_patents_found: list of 3-5 strings ("US10,XXX,XXX — [assignee] — [brief claim description] — expires [year]")
- filing_trends: string (is patenting increasing, decreasing, or shifting to new areas?)
- technology_clusters: list of 3-4 strings (sub-areas where patents cluster)
- prior_art_summary: string (pre-existing published work that might limit what's patentable)
- fto_risk_assessment: "high" | "medium" | "low" | "unclear" (based on density of active patents in the space)
- expiry_landscape: string (when do major patents expire? Any patent cliffs coming?)
- recent_activity: list of 3 strings (from news — new filings, litigation outcomes, licensing deals)
- search_gaps: list of 2-3 strings (what this search didn't cover — specific jurisdictions, trade secrets, design patents)"""
            }
        ],
        response_format={"type": "json_object"}
    )

    try:
        brief = json.loads(response.choices[0].message.content)
        brief["disclaimer"] = DISCLAIMER
        return brief
    except (json.JSONDecodeError, KeyError):
        return None

8. The full research pipeline

Wire the steps together: research the landscape, find patents and prior art, scrape the top patent pages, pull news, then generate the brief. The context dict tells the agent whether you're mapping a landscape, scoping an FTO search, hunting prior art, or watching a competitor — and which jurisdiction and industry to slant toward.

def research_patent_landscape(
    technology_or_invention: str,
    context: dict | None = None,
    max_pages: int = 5
) -> dict | None:
    """
    Run the full patent research pipeline.

    context: {
        "focus": "landscape" | "fto" | "prior-art" | "competitor-watch",
        "jurisdiction": "US" | "EU" | "global",
        "industry": "software" | "biotech" | "electronics" | "mechanical" | "general"
    }
    """
    if context is None:
        context = {"focus": "landscape", "jurisdiction": "global", "industry": "general"}

    print(f"Researching patent landscape: {technology_or_invention}")

    # Step 1: Landscape synthesis
    print("Synthesizing patent landscape...")
    landscape = research_landscape(technology_or_invention)

    # Step 2: Find patents and prior art
    print("Finding patents and prior art...")
    patents = find_patents(technology_or_invention)
    prior_art = find_prior_art(technology_or_invention)

    # Step 3: Scrape the top patent pages
    print(f"Scraping {min(len(patents), max_pages)} patent pages...")
    patent_pages = []
    seen = set()
    for result in patents:
        url = result.get("url")
        if not url or url in seen:
            continue
        seen.add(url)
        page = scrape_patent_page(url)
        if page["content"]:
            patent_pages.append(page)
        if len(patent_pages) >= max_pages:
            break

    # Step 4: Recent patent news
    print("Pulling recent patent news...")
    news = get_patent_news(technology_or_invention)

    # Step 5: Generate the brief
    print("Generating patent intelligence brief...")
    return generate_brief(
        technology_or_invention, landscape, patent_pages, prior_art, news, context
    )

def print_brief(brief: dict):
    if not brief:
        print("Could not generate brief.")
        return

    print(f"\n{'='*60}")
    print(f"Patent Intelligence Brief: {brief.get('technology_area', 'Subject')}")
    print(f"FTO risk assessment: {brief.get('fto_risk_assessment', '?').upper()}")
    print(f"{'='*60}")

    holders = brief.get("key_patent_holders", [])
    if holders:
        print("\nKey Patent Holders:")
        for h in holders:
            print(f"  * {h}")

    patents = brief.get("key_patents_found", [])
    if patents:
        print("\nKey Patents Found:")
        for p in patents:
            print(f"  # {p}")

    print(f"\nFiling Trends:\n{brief.get('filing_trends', '')}")

    clusters = brief.get("technology_clusters", [])
    if clusters:
        print("\nTechnology Clusters:")
        for c in clusters:
            print(f"  - {c}")

    print(f"\nPrior Art Summary:\n{brief.get('prior_art_summary', '')}")
    print(f"\nExpiry Landscape:\n{brief.get('expiry_landscape', '')}")

    activity = brief.get("recent_activity", [])
    if activity:
        print("\nRecent Activity:")
        for a in activity:
            print(f"  ! {a}")

    gaps = brief.get("search_gaps", [])
    if gaps:
        print("\nSearch Gaps (not covered):")
        for g in gaps:
            print(f"  ? {g}")

    print(f"\n{brief.get('disclaimer', '')}")

if __name__ == "__main__":
    import sys

    # Usage: python agent.py "solid-state battery patents" landscape
    query = sys.argv[1] if len(sys.argv) > 1 else "solid-state battery patents"
    focus = sys.argv[2] if len(sys.argv) > 2 else "landscape"

    CONTEXT = {
        "focus": focus,
        "jurisdiction": "global",
        "industry": "general",
    }

    brief = research_patent_landscape(query, CONTEXT, max_pages=5)
    if brief:
        print_brief(brief)

9. Common research topics

10. Use cases

11. Extending the agent

12. Important disclaimer

This agent retrieves publicly available patent information for research and informational purposes. It is not legal advice and does not constitute a freedom-to-operate opinion or a professional prior art search. Patent law is complex, jurisdiction-specific, and constantly evolving. Always consult a qualified patent attorney before making IP strategy decisions, launching a product, or concluding that a patent space is clear.

13. Getting your API key

Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.

For related builds, the drug discovery research agent applies the same four-endpoint pattern to drug pipelines and clinical trials, and the competitor analysis agent uses it on your market rivals.