Build an ESG research agent

Superhighway guides

An ESG analyst's first-pass on a company means hunting across sustainability reports, CDP questionnaires, MSCI and Sustainalytics ratings, CSRD and SEC filings, and a stream of news about controversies and commitments — then stitching it all into an Environmental / Social / Governance picture. This guide builds a Python agent that runs that whole loop automatically. It chains all four Superhighway endpoints — /research for the ESG landscape and sector benchmarks, /search for sustainability reports and third-party ratings, /scrape for metrics off disclosure pages, and /news for controversies and regulatory changes — then uses an LLM to emit a structured ESG brief as JSON with E/S/G pillar summaries, key risks, and regulatory exposure.

1. What you'll build

A Python agent that takes a company name, sector, or ESG topic and produces a structured ESG research brief:

Synthesizes the ESG landscape: regulatory requirements, sector benchmarks, and key frameworks
Finds sustainability reports, ESG ratings, regulatory filings, and third-party assessments
Scrapes ESG disclosure pages and sustainability reports for specific metrics
Pulls recent ESG news: controversies, milestones, regulatory changes, and peer comparisons
Uses an LLM to generate a structured ESG brief with an E/S/G pillar breakdown as JSON

2. Setup

pip install openai requests python-dotenv

Create a .env file with your two keys:

SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

3. Research the ESG landscape

Start with /research, which pulls multi-source background into one synthesis — the regulatory requirements that apply, the sector benchmarks, and the frameworks (GRI, SASB, TCFD) that shape disclosure. This is the context that grounds every later step.

import requests, os, json

SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"

def research_esg_landscape(subject: str) -> str:
    """Deep ESG research: regulatory context, sector benchmarks, frameworks."""
    r = requests.get(
        f"{BASE}/research",
        params={
            "q": f"{subject} ESG sustainability report carbon emissions diversity governance",
            "pages": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    return data.get("synthesis", data.get("markdown", ""))[:3000]

4. Find ESG reports and ratings

Two /search calls. The first hunts for official sustainability reports and regulatory disclosures; the second hunts for third-party ratings and any controversy or greenwashing coverage — the outside view that balances a company's own reporting.

def find_esg_reports(subject: str) -> list[dict]:
    """Find official sustainability reports and regulatory disclosures."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{subject} sustainability report ESG disclosure CSRD SEC climate GRI SASB 2024 2025",
            "limit": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("results", [])

def find_esg_ratings(subject: str) -> list[dict]:
    """Find third-party ratings and controversy coverage."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{subject} ESG rating score Sustainalytics MSCI CDP controversy greenwashing",
            "limit": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("results", [])

5. Scrape ESG disclosure pages

/scrape turns each candidate URL into clean, LLM-ready markdown. This is where the agent pulls actual metrics — emissions figures, diversity percentages, board composition — out of sustainability reports and disclosure pages.

def scrape_esg_page(url: str) -> dict:
    """Scrape a sustainability report or ESG disclosure page."""
    r = requests.get(
        f"{BASE}/scrape",
        params={"url": url},
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    data = r.json()
    return {
        "url": url,
        "title": data.get("title", ""),
        "content": data.get("markdown", "")[:2500],
    }

6. Get recent ESG news

/news surfaces what just happened — controversies, new climate commitments, regulatory updates, and peer comparisons. This is the time-sensitive layer that a static sustainability report misses.

def get_esg_news(subject: str) -> list[dict]:
    """Get recent ESG controversies, milestones, and regulatory changes."""
    r = requests.get(
        f"{BASE}/news",
        params={
            "q": f"{subject} ESG sustainability climate emissions controversy regulation",
            "count": 6
        },
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    return r.json().get("articles", [])

7. Generate the ESG brief with an LLM

Now hand everything to the LLM. The system prompt forbids inventing metrics and is explicit that the output is not a formal ESG rating and not investment advice — the agent summarizes what's in the sources. The output is structured JSON so it slots straight into a report or a screening dashboard.

from openai import OpenAI

llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def generate_brief(
    subject: str,
    landscape: str,
    esg_pages: list[dict],
    news: list[dict],
    context: dict | None = None
) -> dict | None:
    """Generate a structured ESG research brief."""
    context = context or {}
    subject_type = context.get("subject_type", "company")
    region = context.get("region", "global")
    focus = context.get("focus", "all")

    pages_text = "\n".join(
        f"- {p['title']}: {p['content'][:400]}"
        for p in esg_pages[:5]
        if p.get("content")
    )

    news_text = "\n".join(
        f"- {n.get('title', '')} ({n.get('source', '')})"
        for n in news[:6]
    )

    disclaimer = (
        "This summary is based on publicly available information and does not "
        "constitute investment advice or a formal ESG rating. ESG data quality "
        "varies significantly by company and region. For investment decisions, "
        "use certified ESG data providers."
    )

    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": f"""You are an ESG analyst writing a first-pass research brief.
Subject type: {subject_type}. Region focus: {region}. Pillar focus: {focus}.
Be specific and factual. Only use information from the provided sources.
Do not invent metrics — if a figure isn't in the sources, say so.
This is NOT a formal ESG rating and NOT investment advice."""
            },
            {
                "role": "user",
                "content": f"""Write an ESG brief for: {subject}.

ESG Landscape & Regulatory Context:
{landscape[:2000]}

Reports, Ratings & Disclosures:
{pages_text}

Recent ESG News:
{news_text}

Return JSON with:
- subject: string (the company, sector, or topic researched)
- esg_score_estimate: "leader" | "average" | "laggard" | "insufficient-data"
- environmental_summary: string (carbon emissions, climate targets, resource use, waste, energy mix)
- social_summary: string (labor practices, diversity metrics, supply chain ethics, community impact)
- governance_summary: string (board composition, executive pay, transparency, shareholder rights, anti-corruption)
- key_risks: list of 3-4 strings (material ESG risks: stranded assets, regulatory exposure, reputational)
- positive_highlights: list of 2-3 strings (genuine ESG strengths or progress)
- regulatory_exposure: string (which disclosure regimes apply: CSRD, SEC, TCFD, GRI, SASB)
- recent_developments: list of 3 strings (from news — controversies, milestones, new commitments)
- data_sources: list of 3-5 strings (what sources were found: sustainability report, CDP, MSCI, etc.)
- confidence: "high" | "medium" | "low" (based on availability of official disclosures)
- disclaimer: "{disclaimer}""""
            }
        ],
        response_format={"type": "json_object"}
    )

    try:
        brief = json.loads(response.choices[0].message.content)
        brief["disclaimer"] = disclaimer
        return brief
    except (json.JSONDecodeError, KeyError):
        return None

8. The full research pipeline

Wire the steps together: research the landscape, find reports and ratings, scrape the top pages, pull news, then generate the brief. The context argument lets you narrow by subject type, region, and which pillar to emphasize.

def research_esg(
    subject: str,
    context: dict | None = None,
    max_pages: int = 5
) -> dict | None:
    """
    Run the full ESG research pipeline.

    context = {
        "subject_type": "company" | "sector" | "topic",
        "region": "US" | "EU" | "global",
        "focus": "all" | "environmental" | "social" | "governance",
    }
    """
    print(f"Researching ESG profile: {subject}")

    # Step 1: ESG landscape
    print("Researching ESG landscape...")
    landscape = research_esg_landscape(subject)

    # Step 2: Find reports and ratings
    print("Finding reports, disclosures, and ratings...")
    results = find_esg_reports(subject) + find_esg_ratings(subject)

    # Dedupe by URL
    seen, candidates = set(), []
    for r in results:
        url = r.get("url")
        if url and url not in seen:
            seen.add(url)
            candidates.append(r)

    # Step 3: Scrape top pages
    print(f"Scraping {min(len(candidates), max_pages)} ESG pages...")
    esg_pages = []
    for result in candidates[:max_pages]:
        page = scrape_esg_page(result["url"])
        if page["content"]:
            esg_pages.append(page)

    # Step 4: Recent news
    print("Pulling ESG news...")
    news = get_esg_news(subject)

    # Step 5: Generate brief
    print("Generating ESG brief...")
    return generate_brief(subject, landscape, esg_pages, news, context)

def print_brief(brief: dict):
    if not brief:
        print("Could not generate brief.")
        return

    print(f"\n{'='*60}")
    print(f"ESG Brief — {brief.get('subject', 'Subject')}")
    print(f"Estimate: {brief.get('esg_score_estimate', '?').upper()}  |  "
          f"Confidence: {brief.get('confidence', '?').upper()}")
    print(f"{'='*60}")

    print(f"\nEnvironmental:\n  {brief.get('environmental_summary', '')}")
    print(f"\nSocial:\n  {brief.get('social_summary', '')}")
    print(f"\nGovernance:\n  {brief.get('governance_summary', '')}")

    print("\nKey Risks:")
    for risk in brief.get("key_risks", []):
        print(f"  ! {risk}")

    print("\nPositive Highlights:")
    for h in brief.get("positive_highlights", []):
        print(f"  + {h}")

    print(f"\nRegulatory Exposure:\n  {brief.get('regulatory_exposure', '')}")

    developments = brief.get("recent_developments", [])
    if developments:
        print("\nRecent Developments:")
        for d in developments:
            print(f"  * {d}")

    sources = brief.get("data_sources", [])
    if sources:
        print("\nData Sources:")
        for s in sources:
            print(f"  - {s}")

    print(f"\n{brief.get('disclaimer', '')}")

if __name__ == "__main__":
    import sys

    # Usage: python agent.py "Unilever" company EU
    if len(sys.argv) >= 2:
        subject_arg = sys.argv[1]
        ctx = {
            "subject_type": sys.argv[2] if len(sys.argv) > 2 else "company",
            "region": sys.argv[3] if len(sys.argv) > 3 else "global",
            "focus": sys.argv[4] if len(sys.argv) > 4 else "all",
        }
    else:
        subject_arg = "Apple ESG sustainability"
        ctx = {"subject_type": "company", "region": "global", "focus": "all"}

    brief = research_esg(subject_arg, context=ctx)
    if brief:
        print_brief(brief)

9. What you can research

Company ESG profiles — "Apple ESG sustainability", "Tesla ESG controversy", "Unilever sustainability report".
Sector ESG landscape — "oil gas sector ESG transition", "fast fashion ESG social risk", "banking sector ESG risk".
Regulatory requirements — "EU CSRD disclosure requirements", "SEC climate disclosure rules 2025", "TCFD reporting".
Specific ESG topics — "scope 3 emissions reporting", "DEI metrics disclosure", "supply chain human rights due diligence".
ESG controversies — "greenwashing claims 2025", "ESG rating methodology controversy".

10. Use cases

ESG analyst — a rapid first-pass profile before a deep-dive on a company or sector.
Impact investor — pre-screen companies for ESG alignment before formal due diligence.
Sustainability officer — benchmark your company's disclosure against peers in your sector.
Journalist / researcher — track ESG controversies, regulatory changes, and corporate commitments.
Startup founder — understand what ESG disclosures investors will expect as you scale.

11. Extending the agent

Peer comparison — run for 5 companies in a sector and compare esg_score_estimate and key_risks side by side.
Controversy monitoring — schedule weekly with /news?q={company}+ESG+controversy+greenwashing to catch reputational risks early.
Regulatory tracker — run monthly for EU CSRD requirements 2025 2026 to stay current on disclosure obligations.
Portfolio sweep — wrap in a loop over your portfolio companies and output a CSV of score estimates plus key risks.

12. Important disclaimer

This tool retrieves and summarizes publicly available ESG information. It is not a formal ESG rating and does not constitute investment advice. ESG data quality varies significantly — companies with strong disclosure programs will produce more reliable outputs than those with limited public reporting. For investment decisions, use certified ESG data providers (MSCI, Sustainalytics, CDP).

13. Getting your API key

Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.

For related builds, the financial research agent uses the same four-endpoint pattern on companies and markets, and the regulatory research agent covers compliance and disclosure rules.