Build a regulatory research agent

Superhighway guides

Mapping a regulatory landscape means jumping between statutes, agency guidance, compliance checklists, and news about fresh enforcement actions — then trying to turn it into something a team can act on. This guide builds a Python agent that runs that loop and produces a structured regulatory brief. It chains all four Superhighway endpoints — /research for a multi-source synthesis of a regulation's scope and enforcement trend, /search to find primary regulations and compliance checklists, /scrape to pull specific requirements and penalty schedules off agency pages, and /news for recent enforcement actions and proposed rules — then uses an LLM to emit a clear brief: scope, core requirements, a compliance checklist, penalty exposure, recent enforcement, and open questions.

⚠️ Not legal advice. This agent retrieves and summarizes publicly available regulatory information. It is not a substitute for qualified legal counsel. Always consult a licensed attorney for advice specific to your situation before making any compliance decision.

1. What you'll build

A Python agent that takes a regulatory topic or compliance question and produces a structured regulatory brief:

Synthesizes the regulatory landscape deeply — history, scope, and enforcement trend
Finds the primary regulations, official guidance, and compliance requirements
Scrapes regulatory pages for specific requirements and penalty schedules
Pulls recent enforcement actions and regulatory news
Uses an LLM to generate a compliance-focused research brief with a checklist

2. Setup

pip install openai requests python-dotenv

Create a .env file with your two keys:

SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

3. Research the regulatory landscape

Start with /research. One call pulls multiple sources into a synthesis of the regulatory framework — its history, scope, who it applies to, and how enforcement has trended — so the LLM has grounded context instead of guessing.

import requests, os, json

SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"

def research_regulation(topic: str, jurisdiction: str = "US") -> str:
    """Deep synthesis of the regulatory framework, history, and enforcement trends."""
    r = requests.get(
        f"{BASE}/research",
        params={
            "q": f"{topic} regulation compliance requirements {jurisdiction} enforcement",
            "pages": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    return data.get("synthesis", data.get("markdown", ""))[:3000]

4. Find primary regulations and official guidance

Two narrow /search calls: one hunts for the primary regulations, statutes, and official agency guidance — the authoritative sources — and one looks for compliance checklists and requirements summaries that translate the rules into actionable steps.

def find_regulations(topic: str, jurisdiction: str = "US") -> list[dict]:
    """Find the primary regulations, statutes, and official agency guidance."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{topic} {jurisdiction} federal regulation statute compliance requirements official",
            "limit": 8
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("results", [])

def find_compliance_checklists(topic: str) -> list[dict]:
    """Find compliance checklists, guides, and requirements summaries."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{topic} compliance checklist requirements guide penalties",
            "limit": 5
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("results", [])

5. Scrape regulatory pages

/scrape turns each regulatory or compliance page into clean, LLM-ready markdown — specific requirements, penalty schedules, and definitions — so the LLM summarizes real content, not just a title.

def scrape_regulatory_page(url: str) -> dict:
    """Scrape a regulatory or compliance page for specific requirements."""
    r = requests.get(
        f"{BASE}/scrape",
        params={"url": url},
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    return {
        "url": url,
        "title": data.get("title", ""),
        "content": data.get("markdown", "")[:2500],
    }

6. Get recent enforcement and regulatory news

/news surfaces the time-sensitive layer a statute can't give you — fresh enforcement actions, fines, proposed rules, and penalty announcements.

def get_regulatory_news(topic: str, jurisdiction: str = "US") -> list[dict]:
    """Get recent enforcement actions, proposed rules, and regulatory changes."""
    r = requests.get(
        f"{BASE}/news",
        params={
            "q": f"{topic} {jurisdiction} enforcement action fine penalty regulation update",
            "count": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("articles", [])

7. Generate the regulatory brief with an LLM

Now hand everything to the LLM. The system prompt pins the output to the reader's role, forbids legal advice and outcome predictions, and forces every claim back to the sources — anything the sources don't support is flagged, not invented. The output is structured JSON so it slots straight into a doc, checklist, or compliance tracker.

from openai import OpenAI

llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def generate_regulatory_brief(
    topic: str,
    jurisdiction: str,
    landscape_research: str,
    regulations: list[dict],
    checklists: list[dict],
    news: list[dict],
    context: dict
) -> dict | None:
    """Generate a structured regulatory research brief."""

    reg_text = "\n".join(
        f"- {r['title'][:80]}: {r['content'][:400]}"
        for r in regulations[:5]
        if r.get("content")
    )

    checklist_text = "\n".join(
        f"- {c['title'][:80]}"
        for c in checklists[:3]
    )

    news_text = "\n".join(
        f"- {n.get('title', '')} ({n.get('source', '')})"
        for n in news[:5]
    )

    audience = context.get("audience", "compliance professional")
    company_type = context.get("company_type", "business")

    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": f"""You are a regulatory research assistant helping a {audience}.
Write a clear, structured regulatory research brief. Only report information from the provided sources.
Do NOT provide legal advice or predict legal outcomes.
Be factual about regulatory requirements as publicly stated.
If a requirement is jurisdiction-specific, say so."""
            },
            {
                "role": "user",
                "content": f"""Research brief on: {topic} ({jurisdiction})

Audience: {audience} at a {company_type}

Regulatory Landscape:
{landscape_research[:2000]}

Primary Regulations & Official Guidance Found:
{reg_text}

Compliance Checklists & Guides:
{checklist_text}

Recent Enforcement & News:
{news_text}

Return JSON with:
- topic: string
- jurisdiction: string
- scope: string (1-2 sentences — what this regulation covers and who it applies to)
- key_regulatory_bodies: list of 2-3 strings (agencies/bodies responsible for enforcement)
- core_requirements: list of 4-6 strings (primary compliance obligations)
- compliance_checklist: list of 5-7 strings (actionable items to verify compliance)
- penalty_exposure: string (what non-compliance risks — financial, operational, reputational)
- recent_enforcement: list of 2-3 strings (from the news — recent actions, fines, proposed rules)
- key_deadlines_or_thresholds: list of 2-3 strings (important dates, revenue thresholds, employee counts that trigger requirements)
- open_questions: list of 3 strings (areas of regulatory uncertainty or active debate)
- next_steps: list of 3 strings (recommended actions for a {audience})
- disclaimer: "This summary is for informational purposes only and does not constitute legal advice. Consult a qualified attorney for advice specific to your situation." """
            }
        ],
        response_format={"type": "json_object"}
    )

    try:
        return json.loads(response.choices[0].message.content)
    except (json.JSONDecodeError, KeyError):
        return None

8. Wire up the full pipeline

The orchestrator runs research, search, scrape, and news, then hands the lot to the LLM. A context dict tunes the brief to a jurisdiction, audience, and company type.

def research_regulatory_topic(
    topic: str,
    context: dict | None = None,
    max_pages: int = 5
) -> dict | None:
    """
    Research a regulatory or compliance topic.

    topic: e.g. "GDPR data privacy", "SOC 2 compliance", "OSHA workplace safety",
           "FTC dark patterns", "SEC disclosure requirements"
    context: {
        "jurisdiction": "US" | "EU" | "UK" | "California",
        "audience": "compliance officer" | "startup founder" | "legal team",
        "company_type": "SaaS startup" | "healthcare company" | "financial institution"
    }
    """
    if context is None:
        context = {
            "jurisdiction": "US",
            "audience": "compliance officer",
            "company_type": "technology company"
        }

    jurisdiction = context.get("jurisdiction", "US")
    print(f"Researching: {topic} ({jurisdiction})")

    # Deep regulatory research
    print("Synthesizing regulatory landscape...")
    landscape = research_regulation(topic, jurisdiction)

    # Find regulations and checklists
    print("Finding primary regulations...")
    regulations_raw = find_regulations(topic, jurisdiction)
    checklists_raw = find_compliance_checklists(topic)

    # Scrape key pages
    print(f"Scraping {min(len(regulations_raw), max_pages)} regulatory pages...")
    regulations = []
    for result in regulations_raw[:max_pages]:
        page = scrape_regulatory_page(result["url"])
        if page["content"]:
            page["title"] = page["title"] or result.get("title", "")
            regulations.append(page)

    checklists = []
    for result in checklists_raw[:2]:
        page = scrape_regulatory_page(result["url"])
        if page["content"]:
            checklists.append(page)

    # Recent enforcement news
    print("Fetching recent enforcement actions...")
    news = get_regulatory_news(topic, jurisdiction)

    # Generate brief
    print("Generating regulatory brief...")
    return generate_regulatory_brief(
        topic, jurisdiction, landscape, regulations, checklists, news, context
    )

def print_brief(brief: dict):
    if not brief:
        print("Could not generate brief.")
        return

    print(f"\n{'='*60}")
    print(f"Regulatory Brief: {brief.get('topic', 'Topic')} ({brief.get('jurisdiction', '?')})")
    print(f"{'='*60}")
    print(f"\n! {brief.get('disclaimer', '')}\n")
    print(f"Scope: {brief.get('scope', '')}\n")

    bodies = brief.get("key_regulatory_bodies", [])
    if bodies:
        print(f"Regulatory Bodies: {', '.join(bodies)}\n")

    reqs = brief.get("core_requirements", [])
    if reqs:
        print("Core Requirements:")
        for req in reqs:
            print(f"  * {req}")

    checklist = brief.get("compliance_checklist", [])
    if checklist:
        print("\nCompliance Checklist:")
        for item in checklist:
            print(f"  [ ] {item}")

    exposure = brief.get("penalty_exposure", "")
    if exposure:
        print(f"\nPenalty Exposure: {exposure}")

    enforcement = brief.get("recent_enforcement", [])
    if enforcement:
        print("\nRecent Enforcement:")
        for e in enforcement:
            print(f"  ! {e}")

    deadlines = brief.get("key_deadlines_or_thresholds", [])
    if deadlines:
        print("\nKey Thresholds & Deadlines:")
        for d in deadlines:
            print(f"  -> {d}")

    questions = brief.get("open_questions", [])
    if questions:
        print("\nOpen Questions:")
        for q in questions:
            print(f"  ? {q}")

    next_steps = brief.get("next_steps", [])
    if next_steps:
        print("\nNext Steps:")
        for s in next_steps:
            print(f"  >> {s}")

if __name__ == "__main__":
    import sys
    topic = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else "GDPR data privacy"

    CONTEXT = {
        "jurisdiction": "EU",
        "audience": "compliance officer",
        "company_type": "SaaS startup"
    }

    brief = research_regulatory_topic(topic, CONTEXT, max_pages=5)
    if brief:
        print_brief(brief)

9. Use cases

New market entry — research the regulatory landscape before launching in a new jurisdiction (GDPR for the EU, PIPEDA for Canada, CCPA for California).
Compliance gap analysis — run the agent and diff the compliance_checklist against what your program already documents.
Regulatory change monitoring — schedule weekly to watch for new enforcement actions or proposed rules in your area.
Startup due diligence — investors and founders can quickly map regulatory exposure for a new product category.
Policy research — researchers and journalists tracking how regulations evolve in specific sectors.

10. Common regulatory topics

Data privacy — GDPR, CCPA, PIPEDA, LGPD, India DPDP Act.
AI governance — EU AI Act, FTC guidance, NIST AI RMF.
Financial services — SOC 2, PCI-DSS, KYC/AML requirements, SEC disclosure.
Healthcare — HIPAA, FDA medical device regulation, CMS requirements.
Workplace — OSHA, EEOC guidance, state-level labor laws.

11. Extending the agent

Multi-jurisdiction comparison — run research_regulatory_topic() for GDPR + CCPA + PIPEDA and compare compliance_checklist items side by side.
Change detection — store the brief as JSON; rerun weekly and diff recent_enforcement + open_questions to catch what changed.
Gap analysis integration — export compliance_checklist to a spreadsheet and track completion status per item.
Regulatory calendar — combine /news?q={topic}+proposed+rule+comment+period to track upcoming public comment deadlines.

⚠️ Important disclaimer. This agent retrieves and summarizes publicly available regulatory information. It is not a substitute for qualified legal counsel and does not constitute legal advice. Regulatory requirements vary by jurisdiction, industry, and individual circumstances. Always consult a licensed attorney before making compliance decisions.

12. Getting your API key

Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.

For related builds, the financial research agent uses the same four-endpoint pattern on companies and markets, and the academic research agent covers the literature-review synthesis pattern.