Biotech Pipeline & Clinical Trial Research Agent

Superhighway guides

Biotech pipeline and clinical trial research is about late-stage assets — Phase I/II/III readouts, FDA regulatory pathways (Breakthrough/Fast Track/Priority Review/Orphan/RMAT), competitive pipeline intelligence (best-in-class vs. first-in-class), and biopharma M&A/licensing — where a single PDUFA date, AdCom vote, or pivotal trial readout can move a valuation by billions. This guide builds a Python agent for biotech/pharma investors (VC/hedge funds), biopharma competitive intelligence teams, clinical researchers, biotech startup founders, pharma business development teams, healthcare analysts, and life sciences consultants. It chains all four Superhighway endpoints — /research for the mechanism and evidence synthesis, /search against authoritative clinical/regulatory sources and the competitive landscape, /scrape for a specific trial record or FDA action, and /news for trial readouts and FDA decisions — then uses an LLM to emit a structured pipeline brief as JSON.

Overview

The agent takes a drug, program, company, or therapeutic area — "GLP-1 obesity drug pipeline competitive landscape semaglutide tirzepatide", "CAR-T cell therapy hematologic malignancy pipeline approved products" — and produces a structured biotech pipeline and clinical trial brief:

Synthesizes the evidence base: drug mechanism, clinical evidence, competitive landscape, regulatory history, and market context
Searches authoritative clinical and regulatory sources — ClinicalTrials.gov, STAT News, Fierce Pharma, Endpoints News — for trial data, efficacy/safety results, FDA actions, PDUFA dates, and AdCom outcomes
Searches biopharma competitive intelligence — pipeline comparison (best-in-class vs. first-in-class), competitive trial read-outs, licensing deals, M&A premiums, biotech VC funding, IPO/SPAC activity, biosimilar entry timelines, and patent cliff exposure
Scrapes one relevant page: a ClinicalTrials.gov study record, an FDA label/approval letter, a company pipeline page, or a STAT News trial result article
Pulls recent news: trial readouts, FDA approvals/rejections/CRLs, PDUFA outcomes, AdCom votes, biotech M&A, IND/NDA/BLA filings, and earnings pipeline updates
Uses an LLM to generate a structured brief — modality, therapeutic area, stage, clinical evidence, regulatory status, competition, commercial opportunity, deals, and investment signals as JSON

Who it's for: biotech/pharma investors (VC/hedge funds), biopharma competitive intelligence teams, clinical researchers, biotech startup founders, pharma business development teams, healthcare analysts, and life sciences consultants.

Scope note: This agent covers late-stage assets — clinical trials, FDA pathways, pipeline competition, and biopharma M&A/licensing. For target identification, lead optimization, and preclinical drug discovery science, see the drug discovery research agent.

Not medical or investment advice. This agent surfaces publicly available clinical and regulatory intelligence for research purposes only — not medical recommendations, treatment guidance, or investment advice. Consult a licensed physician for medical decisions and a financial advisor for investment decisions.

How it works

Five endpoint calls feed one LLM synthesis:

/research — deep synthesis: drug mechanism, clinical evidence base, competitive landscape, regulatory history, and market context.
/search (clinical/regulatory sources) — trial data, efficacy/safety results, FDA actions, PDUFA dates, and AdCom outcomes scoped to ClinicalTrials.gov, STAT News, Fierce Pharma, and Endpoints News.
/search (competitive landscape, time=year) — pipeline comparison, competitive read-outs, licensing deals, M&A premiums, biotech VC funding, IPO/SPAC activity, biosimilar timelines, and patent cliff exposure.
/scrape — one relevant URL, e.g. a ClinicalTrials.gov study record, an FDA label/approval letter, a company pipeline page, or a STAT News trial result article.
/news (time=week) — very recent trial readouts, FDA approvals/rejections/CRLs, PDUFA outcomes, AdCom votes, biotech M&A, IND/NDA/BLA filings, and earnings pipeline updates.

Full example

pip install openai requests python-dotenv

Create a .env file with your two keys:

SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

import requests, os, json
from openai import OpenAI

SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"
HEADERS = {"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}

llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# 1. Deep synthesis of the drug / program / therapeutic area
def research_asset(query: str) -> str:
    """Mechanism, clinical evidence, competitive landscape, regulatory history."""
    r = requests.get(
        f"{BASE}/research",
        params={"q": f"{query} biotech clinical trial pipeline FDA approval"},
        headers=HEADERS,
    )
    data = r.json()
    return data.get("summary", "")[:3000]

# 2. Authoritative clinical / regulatory sources
def search_clinical(query: str) -> list[dict]:
    """Trial data, efficacy/safety results, FDA actions, PDUFA dates, AdCom outcomes."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{query} site:clinicaltrials.gov OR site:statnews.com "
                 f"OR site:fiercepharma.com OR site:endpoints.news "
                 f"clinical trial data efficacy safety results",
        },
        headers=HEADERS,
    )
    return r.json().get("results", [])

# 3. Biopharma competitive intelligence (last year)
def search_competitive(query: str) -> list[dict]:
    """Pipeline comparison, competitive read-outs, licensing deals, M&A, funding."""
    r = requests.get(
        f"{BASE}/search",
        params={
            "q": f"{query} biotech pharma pipeline competitive landscape "
                 f"drug approval M&A licensing deal",
            "time": "year",
        },
        headers=HEADERS,
    )
    return r.json().get("results", [])

# 4. Scrape one relevant trial record / FDA action / pipeline page
def scrape_page(url: str) -> dict:
    """Pull a ClinicalTrials.gov record, FDA label, pipeline page, or trial article."""
    r = requests.post(
        f"{BASE}/scrape",
        json={"url": url, "mode": "markdown"},
        headers=HEADERS,
    )
    data = r.json()
    return {
        "url": url,
        "title": data.get("title", ""),
        "content": data.get("markdown", data.get("text", ""))[:2500],
    }

# 5. Recent news: trial readouts, FDA decisions, M&A (last week)
def get_news(query: str) -> list[dict]:
    """Trial readouts, FDA approvals/rejections/CRLs, PDUFA outcomes, AdCom votes, M&A."""
    r = requests.get(
        f"{BASE}/news",
        params={
            "q": f"{query} clinical trial results FDA approval biotech "
                 f"earnings pipeline update",
            "time": "week",
        },
        headers=HEADERS,
    )
    return r.json().get("results", [])

def generate_brief(
    query: str,
    synthesis: str,
    clinical: list[dict],
    competitive: list[dict],
    scraped: dict | None,
    news: list[dict],
) -> dict | None:
    """Generate a structured biotech pipeline & clinical trial brief as JSON."""

    clinical_text = "\n".join(
        f"- {r.get('title', '')}: {r.get('snippet', '')} ({r.get('url', '')})"
        for r in clinical[:6]
    )
    competitive_text = "\n".join(
        f"- {r.get('title', '')}: {r.get('snippet', '')}"
        for r in competitive[:6]
    )
    news_text = "\n".join(
        f"- {n.get('title', '')}: {n.get('snippet', '')}"
        for n in news[:6]
    )
    scraped_text = ""
    if scraped and scraped.get("content"):
        scraped_text = f"{scraped['title']}\n{scraped['content']}"

    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a biotech equity research and clinical development analyst. "
                    "Use ONLY the provided sources. Do not invent efficacy data, trial "
                    "endpoints, PDUFA dates, deal values, or market sizes — if a detail is "
                    "not in the sources, say 'not found in sources.' Be precise about "
                    "development stage, regulatory status, and clinical endpoints, and flag "
                    "when figures may be estimates or consensus projections. This is "
                    "research synthesis, not medical or investment advice."
                ),
            },
            {
                "role": "user",
                "content": f"""Write a biotech pipeline & clinical trial brief for: {query}

Evidence Synthesis:
{synthesis}

Clinical & Regulatory Sources (ClinicalTrials.gov / STAT / Fierce Pharma / Endpoints):
{clinical_text}

Competitive Landscape:
{competitive_text}

Scraped Trial Record / FDA Action / Pipeline Page:
{scraped_text}

Recent News:
{news_text}

Return JSON with ALL of these fields:
- subject: drug, program, company, or therapeutic area being researched
- drug_modality: "small-molecule" | "monoclonal-antibody" | "ADC" | "bispecific" | "cell-therapy" | "gene-therapy" | "RNA-therapy" | "protein" | "vaccine" | "biosimilar" | "mixed"
- therapeutic_area: "oncology" | "CNS" | "immunology" | "cardiovascular" | "rare-disease" | "infectious-disease" | "metabolic" | "ophthalmology" | "respiratory" | "gastroenterology" | "mixed"
- development_stage: "preclinical" | "Phase-I" | "Phase-I/II" | "Phase-II" | "Phase-III" | "NDA-BLA-submitted" | "FDA-approved" | "post-approval" | "biosimilar-pending"
- clinical_evidence_summary: key efficacy and safety data — primary endpoints, response rates/ORR, PFS/OS data if available, adverse event profile, comparison to current standard of care
- regulatory_pathway_and_status: FDA designation status (Breakthrough Therapy/Fast Track/Priority Review/Orphan Drug/RMAT), PDUFA date if filed, recent FDA actions (Complete Response Letter/approval/AdCom vote), EU/EMA status
- competitive_landscape: similar drugs in same indication — approved competitors, late-stage pipeline competition, best-in-class analysis, mechanism differentiation, patent expiry timelines for approved drugs
- commercial_opportunity: target patient population size, pricing context (comparable approved drug pricing), market size estimates, payer/reimbursement considerations (ICER analysis if available)
- deal_and_partnership_activity: recent licensing deals, M&A, co-development partnerships — deal value, milestone structure, strategic rationale; biotech-pharma BD landscape for this area
- investment_signals: biotech company valuation context, pipeline risk-adjustment, recent capital raises, short interest, insider buying, hedge fund 13F filings if notable
- data_quality: "high" | "medium" | "low" — based on availability of clinical data and regulatory documentation
- disclaimer: "Not medical or investment advice. For research purposes only. Clinical data and regulatory status should be verified against ClinicalTrials.gov and FDA.gov." """,
            },
        ],
        response_format={"type": "json_object"},
    )

    try:
        return json.loads(response.choices[0].message.content)
    except (json.JSONDecodeError, KeyError):
        return None

def research_biotech(query: str) -> dict | None:
    """Run the full biotech pipeline & clinical trial research pipeline."""
    print(f"Researching biotech: {query}")

    print("Synthesizing evidence base...")
    synthesis = research_asset(query)

    print("Searching clinical & regulatory sources...")
    clinical = search_clinical(query)

    print("Searching competitive landscape...")
    competitive = search_competitive(query)

    print("Scraping a relevant trial record / FDA action / pipeline page...")
    scraped = None
    for result in clinical + competitive:
        url = result.get("url")
        if url:
            scraped = scrape_page(url)
            if scraped.get("content"):
                break

    print("Pulling recent biotech news...")
    news = get_news(query)

    print("Generating biotech brief...")
    return generate_brief(query, synthesis, clinical, competitive, scraped, news)

def print_brief(brief: dict):
    if not brief:
        print("Could not generate brief.")
        return
    print(f"\n{'='*60}")
    print(f"Biotech Pipeline & Clinical Trial Brief")
    print(f"{'='*60}")
    print(f"\nSubject: {brief.get('subject', '')}")
    print(f"Drug Modality: {brief.get('drug_modality', '')}")
    print(f"Therapeutic Area: {brief.get('therapeutic_area', '')}")
    print(f"Development Stage: {brief.get('development_stage', '')}")
    print(f"\nClinical Evidence:\n{brief.get('clinical_evidence_summary', '')}")
    print(f"\nRegulatory Pathway & Status:\n{brief.get('regulatory_pathway_and_status', '')}")
    print(f"\nCompetitive Landscape:\n{brief.get('competitive_landscape', '')}")
    print(f"\nCommercial Opportunity:\n{brief.get('commercial_opportunity', '')}")
    print(f"\nDeal & Partnership Activity:\n{brief.get('deal_and_partnership_activity', '')}")
    print(f"\nInvestment Signals:\n{brief.get('investment_signals', '')}")
    print(f"\nData Quality: {brief.get('data_quality', '?')}")
    print(f"\n{brief.get('disclaimer', '')}")

if __name__ == "__main__":
    import sys
    query = sys.argv[1] if len(sys.argv) > 1 else "GLP-1 obesity drug pipeline competitive landscape semaglutide tirzepatide"
    brief = research_biotech(query)
    print_brief(brief)

Usage examples

"GLP-1 obesity drug pipeline competitive landscape semaglutide tirzepatide" — maps the GLP-1/GIP receptor agonist class (Novo Nordisk/Eli Lilly/Amgen/Pfizer next-gen pipeline), pulls SURMOUNT-5 head-to-head data, traces the oral GLP-1 timeline and once-monthly formulations, surfaces cardiovascular outcome trial data, frames the biosimilar timeline, and flags the pricing/access controversy.
"CAR-T cell therapy hematologic malignancy pipeline approved products" — profiles the CD19/BCMA/CD22 CAR-T landscape (Kymriah/Yescarta/Carvykti/Breyanzi), covers next-gen allogeneic and in-vivo CAR-T, compares CRS/ICANS safety profiles, weighs manufacturing scalability, tracks solid-tumor CAR-T progress, and frames pricing ($400k-$500k) and CMS reimbursement.
"Alzheimer's disease amyloid antibody lecanemab donanemab FDA" — assesses amyloid hypothesis validation, the lecanemab ARIA safety signal, donanemab Phase III TRAILBLAZER results, the FDA accelerated approval pathway, the CMS reimbursement decision, diagnostic companions (amyloid PET/CSF/p-tau217), and the pipeline beyond amyloid (tau/neuroinflammation/synaptic).

Research use only. Clinical trial data and FDA status change rapidly. Verify efficacy data, regulatory status, and pipeline information directly against ClinicalTrials.gov, FDA.gov (drugs@FDA), company SEC filings (10-K/10-Q), and official press releases before making any medical or investment decisions.

Getting your API key

Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.

Biotech Pipeline & Clinical Trial Research Agent

Overview

How it works

Full example

Usage examples

Getting your API key

See also