Build a healthcare research agent
Researching a diagnosis, a treatment, or a clinical trial means jumping between PubMed abstracts, ClinicalTrials.gov, treatment guidelines, and news about FDA approvals — then trying to make sense of it all. This guide builds a Python agent that runs that loop and produces a structured health research brief. It chains all four Superhighway endpoints — /research for a multi-source synthesis of a condition, /search to find clinical trials and treatment guidelines, /scrape to pull study details off medical pages, and /news for recent FDA approvals and trial results — then uses an LLM to emit a clear brief: condition overview, standard of care, active research, key studies, and specific questions to ask your doctor.
1. What you'll build
A Python agent that takes a medical condition or treatment question and produces a structured health research brief:
- Researches the condition or treatment deeply across medical sources
- Finds relevant clinical trials and studies
- Scrapes medical pages for study details and treatment guidelines
- Pulls recent FDA / clinical news
- Uses an LLM to generate a research summary with questions to ask a doctor
2. Setup
pip install openai requests python-dotenv
Create a .env file with your two keys:
SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
3. Research the medical topic
Start with /research. One call pulls multiple sources into a synthesis of the condition — pathophysiology, symptoms, diagnosis, and the standard of care — so the LLM has grounded context instead of guessing.
import requests, os, json
SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"
def research_condition(topic: str) -> str:
"""Deep research: pathophysiology, symptoms, diagnosis, standard of care."""
r = requests.get(
f"{BASE}/research",
params={
"q": f"{topic} pathophysiology symptoms diagnosis treatment guidelines",
"pages": 6
},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
data = r.json()
return data.get("synthesis", data.get("markdown", ""))[:3000]
4. Find clinical trials and studies
Two narrow /search calls: one hunts for clinical trials, RCTs, and systematic reviews — the primary evidence — and one looks for evidence-based treatment guidelines and clinical protocols.
def find_clinical_studies(topic: str) -> list[dict]:
"""Find clinical trials, RCTs, and systematic reviews."""
r = requests.get(
f"{BASE}/search",
params={
"q": f"{topic} clinical trial study randomized controlled",
"limit": 8
},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
return r.json().get("results", [])
def find_treatment_guidelines(topic: str) -> list[dict]:
"""Find evidence-based treatment guidelines and clinical protocols."""
r = requests.get(
f"{BASE}/search",
params={
"q": f"{topic} treatment guidelines clinical protocol evidence-based",
"limit": 5
},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
return r.json().get("results", [])
5. Scrape study and guideline pages
/scrape turns each medical page into clean, LLM-ready markdown — inclusion criteria, endpoints, outcomes, and study details — so the LLM summarizes real content, not just a title.
def scrape_medical_page(url: str) -> dict:
"""Scrape a medical page for study details, inclusion criteria, outcomes."""
r = requests.get(
f"{BASE}/scrape",
params={"url": url},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
data = r.json()
return {
"url": url,
"title": data.get("title", ""),
"content": data.get("markdown", "")[:2500],
}
6. Get recent medical news
/news surfaces the time-sensitive layer a guideline can't give you — FDA approvals, fresh trial results, treatment breakthroughs, and recalls.
def get_medical_news(topic: str) -> list[dict]:
"""Get recent FDA approvals, trial results, treatment breakthroughs."""
r = requests.get(
f"{BASE}/news",
params={
"q": f"{topic} FDA approval clinical trial results treatment",
"count": 6
},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
return r.json().get("articles", [])
7. Generate the research brief with an LLM
Now hand everything to the LLM. The system prompt pins the output to the reader's level, forbids medical advice, and forces every claim back to the sources — anything the sources don't support is flagged, not invented. The output is structured JSON so it slots straight into a doc or notebook.
from openai import OpenAI
llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def generate_health_brief(
topic: str,
condition_research: str,
studies: list[dict],
guidelines: list[dict],
news: list[dict],
context: dict
) -> dict | None:
"""Generate a structured health research brief."""
study_text = "\n".join(
f"- {s['title'][:80]}: {s['content'][:400]}"
for s in studies[:5]
if s.get("content")
)
guideline_text = "\n".join(
f"- {g['title'][:80]}"
for g in guidelines[:3]
)
news_text = "\n".join(
f"- {n.get('title', '')} ({n.get('source', '')})"
for n in news[:5]
)
audience = context.get("audience", "patient and caregiver")
response = llm.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": f"""You are a health research assistant helping a {audience} understand medical literature.
Write clearly and accessibly. Only report information from the provided sources.
Do NOT provide medical advice, diagnosis, or treatment recommendations.
Always remind the user to consult their healthcare provider.
If a source doesn't clearly state a fact, say 'sources unclear on this point.'"""
},
{
"role": "user",
"content": f"""Research brief on: {topic}
Condition/Treatment Research:
{condition_research[:2000]}
Clinical Studies Found:
{study_text}
Treatment Guidelines:
{guideline_text}
Recent News:
{news_text}
Return JSON with:
- topic: string
- overview: string (2-3 sentences — what this is and why it matters)
- how_it_works: string (mechanism / pathophysiology in plain language)
- current_standard_of_care: list of 3-4 strings (established treatments/approaches)
- active_research_areas: list of 3 strings (what researchers are currently investigating)
- recent_developments: list of 2-3 strings (from the news — approvals, trial results)
- study_highlights: list of objects, each with:
- title: string
- key_finding: string (1 sentence)
- study_type: string (e.g. 'RCT', 'systematic review', 'observational')
- url: string
- questions_for_your_doctor: list of 5 strings (specific, actionable questions to ask)
- reliable_sources: list of 3 strings (authoritative websites to learn more — e.g. 'NIH MedlinePlus', 'Mayo Clinic', 'CDC')
- disclaimer: "This summary is for informational purposes only. Always consult a qualified healthcare professional before making medical decisions." """
}
],
response_format={"type": "json_object"}
)
try:
return json.loads(response.choices[0].message.content)
except (json.JSONDecodeError, KeyError):
return None
8. Wire up the full pipeline
The orchestrator runs the steps in order: research the condition, find studies and guidelines, scrape the top pages for detail, pull recent news, and hand the whole bundle to the LLM. A context dict tunes the audience in one place.
def research_health_topic(
topic: str,
context: dict | None = None,
max_pages: int = 5
) -> dict | None:
"""
Research a medical topic or condition.
topic: e.g. "type 2 diabetes treatment", "migraine prevention", "knee osteoarthritis"
context: {
"audience": "patient and caregiver" | "medical student" | "healthcare professional",
}
"""
if context is None:
context = {"audience": "patient and caregiver"}
print(f"Researching: {topic}")
# Deep condition research
print("Researching condition...")
condition_research = research_condition(topic)
# Find studies and guidelines
print("Finding clinical studies...")
studies_raw = find_clinical_studies(topic)
guidelines_raw = find_treatment_guidelines(topic)
# Scrape pages for details
print(f"Extracting details from {min(len(studies_raw), max_pages)} sources...")
studies = []
for result in studies_raw[:max_pages]:
page = scrape_medical_page(result["url"])
if page["content"]:
page["title"] = page["title"] or result.get("title", "")
studies.append(page)
guidelines = []
for result in guidelines_raw[:2]:
page = scrape_medical_page(result["url"])
if page["content"]:
guidelines.append(page)
# Recent news
print("Fetching recent medical news...")
news = get_medical_news(topic)
# Generate brief
print("Generating research brief...")
return generate_health_brief(
topic, condition_research, studies, guidelines, news, context
)
def print_brief(brief: dict):
if not brief:
print("Could not generate brief.")
return
print(f"\n{'='*60}")
print(f"Health Research Brief: {brief.get('topic', 'Topic')}")
print(f"{'='*60}")
print(f"\n! {brief.get('disclaimer', '')}\n")
print(f"{brief.get('overview', '')}\n")
print(f"How it works: {brief.get('how_it_works', '')}\n")
soc = brief.get("current_standard_of_care", [])
if soc:
print("Current Standard of Care:")
for s in soc:
print(f" * {s}")
research = brief.get("active_research_areas", [])
if research:
print("\nActive Research Areas:")
for a in research:
print(f" -> {a}")
developments = brief.get("recent_developments", [])
if developments:
print("\nRecent Developments:")
for d in developments:
print(f" ! {d}")
studies = brief.get("study_highlights", [])
if studies:
print("\nKey Studies:")
for s in studies[:3]:
print(f" [{s.get('study_type', '?')}] {s.get('title', '')[:70]}")
print(f" {s.get('key_finding', '')[:100]}")
questions = brief.get("questions_for_your_doctor", [])
if questions:
print("\nQuestions to Ask Your Doctor:")
for i, q in enumerate(questions, 1):
print(f" {i}. {q}")
sources = brief.get("reliable_sources", [])
if sources:
print("\nReliable Sources:")
for s in sources:
print(f" - {s}")
if __name__ == "__main__":
import sys
topic = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else "type 2 diabetes management"
CONTEXT = {
"audience": "patient and caregiver",
}
brief = research_health_topic(topic, CONTEXT, max_pages=5)
if brief:
print_brief(brief)
9. What you can research
- Pre-appointment prep — research your diagnosis before a doctor's visit and generate targeted questions to bring with you.
- Clinical trial discovery — find active trials for a condition and research inclusion / exclusion criteria.
- Treatment comparison research — run the agent for each treatment option and compare the current evidence side by side.
- Medical student study — research a clinical entity with configurable depth for exam prep.
- Health journalism — get a structured starting point for medical reporting with key studies and recent news.
10. Extending the agent
- ClinicalTrials.gov integration — add
/search?q=site:clinicaltrials.gov+{topic}+recruitingto bias toward open, recruiting trials. - Drug interaction research — run the agent for each drug in a regimen and check for overlap in mechanism.
- Longitudinal monitoring — run weekly for a condition and diff
recent_developmentsto catch new approvals and trial results. - Patient-friendly summaries — use
/researchoutput withaudience: "patient and caregiver"to generate plain-English explainers.
11. Getting your API key
Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.
For related builds, the academic research agent covers the literature-review synthesis pattern, and the financial research agent uses the same four-endpoint pattern on companies and markets.