Build a cybersecurity research agent

Superhighway guides

Researching a threat means jumping between CVE databases, vendor advisories, security-research write-ups, and fresh disclosure news — then turning all of it into something a security team can act on. This guide builds a Python agent that runs that loop and emits a structured threat intelligence report. It chains all four Superhighway endpoints — /research for a multi-source synthesis of a vulnerability or attack technique, /search to find specific CVE entries, advisories, and PoC details, /scrape to pull technical specifics off NVD/MITRE/vendor pages, and /news for recent disclosures, patches, and exploitation-in-the-wild reports — then uses an LLM to produce a report: threat type, severity, affected systems, attack vector, IOCs, mitigation steps, patch status, and a risk score.

⚠️ Responsible use. This agent retrieves and summarizes publicly available security information for defensive research — understanding threats so you can build better defenses. Always comply with applicable laws and your organization's security policy. Verify findings with authoritative sources (NIST NVD, vendor advisories, MITRE ATT&CK). This tool does not assist with offensive testing, unauthorized access, or illegal activity. If you discover an undisclosed vulnerability, follow responsible disclosure practices.

1. What you'll build

A Python agent that takes a security topic, CVE ID, threat actor, or attack technique and produces a structured threat intelligence report:

Synthesizes the threat landscape, affected systems, and mitigation options deeply
Finds specific CVE details, security advisories, and vendor bulletins
Scrapes NVD/MITRE/vendor pages for technical specifics — CVSS, CWE, affected versions
Pulls recent vulnerability news, patch announcements, and breach reports
Uses an LLM to generate a structured threat intelligence report with mitigations and IOCs

2. Setup

pip install openai requests python-dotenv

Create a .env file with your two keys:

SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

3. Deep threat research

Start with /research. One call pulls multiple sources into a synthesis of the threat — how the attack works, what it affects, and the mitigation landscape — so the LLM has grounded context instead of guessing.

import requests, os, json

SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"

def research_threat(topic: str) -> str:
    """Deep synthesis of the threat, attack vectors, and mitigation landscape."""
    r = requests.get(
        f"{BASE}/research",
        params={
            "q": f"{topic} vulnerability attack vector mitigation CVE",
            "pages": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    return data.get("synthesis", data.get("markdown", ""))[:3000]

4. Find CVEs, advisories, and vendor bulletins

Two narrow /search calls from different angles: one hunts authoritative CVE entries, NIST/NVD records, and vendor advisories; the other looks for proof-of-concept and exploitation write-ups from security researchers — the technical detail you won't find in a vendor bulletin.

def find_advisories(topic: str) -> list[dict]:
    """Find CVE entries, NVD records, and official vendor security advisories."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{topic} CVE NVD MITRE vendor security advisory bulletin official",
            "limit": 8
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("results", [])

def find_technical_details(topic: str) -> list[dict]:
    """Find PoC and exploitation/analysis write-ups from security researchers."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{topic} proof of concept exploit technical analysis security research writeup",
            "limit": 5
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("results", [])

5. Scrape technical details

/scrape turns each advisory or research page into clean, LLM-ready markdown — CVSS scores, CWE classifications, affected versions, and remediation guidance — so the LLM summarizes real content, not just a title.

def scrape_security_page(url: str) -> dict:
    """Scrape an advisory, CVE, or research page for technical specifics."""
    r = requests.get(
        f"{BASE}/scrape",
        params={"url": url},
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    return {
        "url": url,
        "title": data.get("title", ""),
        "content": data.get("markdown", "")[:2500],
    }

6. Get recent threat news

/news surfaces the time-sensitive layer a CVE record can't give you — fresh patches, exploitation in the wild, zero-day disclosures, and breach reports.

def get_threat_news(topic: str) -> list[dict]:
    """Get recent disclosures, patches, exploits in the wild, and breach reports."""
    r = requests.get(
        f"{BASE}/news",
        params={
            "q": f"{topic} vulnerability patch breach zero-day",
            "count": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("articles", [])

7. Generate the threat intelligence report with an LLM

Now hand everything to the LLM. The system prompt pins the output to defensive research, forbids operational attack guidance, and forces every claim back to the sources — anything the sources don't support is flagged, not invented. The output is structured JSON so it slots straight into a ticket, brief, or threat-intel tracker.

from openai import OpenAI

llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def generate_threat_report(
    topic: str,
    threat_research: str,
    advisories: list[dict],
    news: list[dict],
    context: dict
) -> dict | None:
    """Generate a structured threat intelligence report."""

    advisory_text = "\n".join(
        f"- {a['title'][:80]}: {a['content'][:400]}"
        for a in advisories[:6]
        if a.get("content")
    )

    news_text = "\n".join(
        f"- {n.get('title', '')} ({n.get('source', '')})"
        for n in news[:5]
    )

    scope = context.get("scope", "enterprise")
    urgency = context.get("urgency", "routine")
    org_type = context.get("org_type", "general")

    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": f"""You are a threat intelligence analyst supporting DEFENSIVE security research.
Write a clear, structured threat intelligence report. Only report information from the provided sources.
Focus on understanding threats to build better defenses. Do NOT provide operational attack instructions,
working exploit code, or step-by-step intrusion guidance.
Be factual about severity and exploitation status as publicly reported.
If a detail (CVSS, CWE, patch) is not in the sources, say so rather than guessing."""
            },
            {
                "role": "user",
                "content": f"""Threat intelligence report on: {topic}

Scope: {scope} | Urgency: {urgency} | Org type: {org_type}

Threat Landscape Synthesis:
{threat_research[:2000]}

Advisories & Technical Sources Found:
{advisory_text}

Recent News & Disclosures:
{news_text}

Return JSON with:
- topic: string
- threat_type: "vulnerability" | "malware" | "attack-technique" | "threat-actor" | "supply-chain" | "configuration-risk"
- severity: "critical" | "high" | "medium" | "low" | "informational"
- affected_systems: list of strings (OSes, software, versions, configurations)
- attack_vector: string (how the attack works — technical summary)
- technical_details: string (deeper breakdown — CVSS score if available, CWE type)
- indicators_of_compromise: list of strings (IOCs: file hashes, domains, IP ranges, behavioral patterns — "None identified from public sources" if unavailable)
- mitigation_steps: list of 4-6 strings (actionable defensive steps)
- patch_status: string (patched/unpatched/partial — what's available and from whom)
- recent_developments: list of 3 strings (from the news — recent patches, exploits in the wild, breach reports)
- references: list of 3-5 strings (CVE IDs, MITRE ATT&CK techniques, vendor advisory URLs from findings)
- risk_score: number 1-10 (1=low risk, 10=critical/active exploitation)
- disclaimer: "This report summarizes publicly available security information for defensive research purposes only. Always verify findings with authoritative sources (NVD, vendor advisories) and consult your security team before taking action." """
            }
        ],
        response_format={"type": "json_object"}
    )

    try:
        return json.loads(response.choices[0].message.content)
    except (json.JSONDecodeError, KeyError):
        return None

8. Wire up the full pipeline

The orchestrator runs research, two search passes, scrape, and news, then hands the lot to the LLM. A context dict tunes the report to your environment and urgency.

def research_security_topic(
    topic: str,
    context: dict | None = None,
    max_pages: int = 5
) -> dict | None:
    """
    Research a security topic, CVE, threat actor, or attack technique.

    topic: e.g. "Log4Shell CVE-2021-44228", "SQL injection", "APT29 TTPs",
           "AWS S3 misconfiguration exposure", "Chrome zero-day 2025"
    context: {
        "scope": "enterprise" | "cloud" | "iot" | "web-app",
        "urgency": "immediate" | "routine",
        "org_type": "financial" | "healthcare" | "tech" | "general"
    }
    """
    if context is None:
        context = {"scope": "enterprise", "urgency": "routine", "org_type": "general"}

    print(f"Researching: {topic}")

    # Deep threat research
    print("Synthesizing threat landscape...")
    landscape = research_threat(topic)

    # Find advisories and technical write-ups
    print("Finding CVEs, advisories, and technical sources...")
    advisories_raw = find_advisories(topic)
    technical_raw = find_technical_details(topic)

    # Scrape the strongest sources
    print(f"Scraping up to {max_pages} security pages...")
    advisories = []
    for result in (advisories_raw + technical_raw)[:max_pages]:
        page = scrape_security_page(result["url"])
        if page["content"]:
            page["title"] = page["title"] or result.get("title", "")
            advisories.append(page)

    # Recent threat news
    print("Fetching recent disclosures and patch news...")
    news = get_threat_news(topic)

    # Generate report
    print("Generating threat intelligence report...")
    return generate_threat_report(topic, landscape, advisories, news, context)

def print_report(report: dict):
    if not report:
        print("Could not generate report.")
        return

    print(f"\n{'='*60}")
    print(f"Threat Intel: {report.get('topic', 'Topic')}")
    print(f"{'='*60}")
    print(f"\n! {report.get('disclaimer', '')}\n")
    print(f"Type:     {report.get('threat_type', '?')}")
    print(f"Severity: {report.get('severity', '?')}")
    print(f"Risk:     {report.get('risk_score', '?')}/10")
    print(f"Patch:    {report.get('patch_status', '?')}\n")

    print(f"Attack Vector: {report.get('attack_vector', '')}\n")
    print(f"Technical Details: {report.get('technical_details', '')}\n")

    affected = report.get("affected_systems", [])
    if affected:
        print("Affected Systems:")
        for a in affected:
            print(f"  * {a}")

    iocs = report.get("indicators_of_compromise", [])
    if iocs:
        print("\nIndicators of Compromise:")
        for i in iocs:
            print(f"  - {i}")

    mitigations = report.get("mitigation_steps", [])
    if mitigations:
        print("\nMitigation Steps:")
        for m in mitigations:
            print(f"  [ ] {m}")

    developments = report.get("recent_developments", [])
    if developments:
        print("\nRecent Developments:")
        for d in developments:
            print(f"  ! {d}")

    refs = report.get("references", [])
    if refs:
        print("\nReferences:")
        for ref in refs:
            print(f"  -> {ref}")

if __name__ == "__main__":
    import sys
    topic = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else "Log4Shell CVE-2021-44228"

    CONTEXT = {
        "scope": "enterprise",
        "urgency": "immediate",
        "org_type": "tech"
    }

    report = research_security_topic(topic, CONTEXT, max_pages=5)
    if report:
        print_report(report)

9. Common topics

CVE lookups — "CVE-2024-XXXXX", "Log4Shell CVE-2021-44228", "Heartbleed".
Attack techniques — "SQL injection", "prompt injection LLM", "ransomware lateral movement".
Threat actors — "APT29 TTPs", "Lazarus Group supply chain attacks".
Configuration risks — "AWS S3 misconfiguration exposure", "Kubernetes RBAC risks".
Zero-days — "Chrome zero-day 2025", "Windows privilege escalation recent".

10. Use cases

SOC analyst — research an alert: is this CVE actively exploited, what's the severity, and is there a patch?
Bug bounty hunter — understand the technical landscape of a vulnerability class before testing in scope.
Developer — threat modeling: which attack techniques target the stack you're building on?
CISO — weekly threat briefing: which new critical CVEs affect your systems?
Security researcher — a fast literature sweep on an attack technique or a threat actor's TTPs.

11. Extending the agent

CVE monitor — schedule daily to search /news?q=CVE+critical+zero-day and alert on new critical disclosures.
Patch tracking — re-run weekly on a specific CVE ID and diff patch_status to catch when a fix ships.
Asset-specific scope — list your stack (Python, nginx, PostgreSQL) and run a batch of targeted queries.
MITRE ATT&CK mapping — extend the LLM prompt to map findings to specific ATT&CK technique IDs (e.g. T1566 for phishing).

⚠️ Important: responsible use. This agent retrieves and summarizes publicly available security information from CVE databases, vendor advisories, and security research publications. It is intended for defensive security research — understanding threats so you can build better defenses.

Always comply with applicable laws and your organization's security policy.
Verify findings with authoritative sources (NIST NVD, vendor advisories, MITRE ATT&CK).
This tool does not assist with offensive security testing, unauthorized access, or illegal activities.
If you find an undisclosed vulnerability, follow responsible disclosure practices.

12. Getting your API key

Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.

For related builds, the regulatory research agent applies the same four-endpoint pattern to compliance, and the financial research agent covers companies and markets.