Build an academic research agent
A literature review means weeks of work: hunting for papers, skimming abstracts, mapping out the major threads, spotting the gaps nobody has filled, and figuring out what to read first. This guide builds a Python agent that runs that whole loop in minutes. It chains all four Superhighway endpoints — /search to find papers and survey articles, /scrape to pull abstracts and key findings off paper pages, /research for a multi-source synthesis of the field, and /news for the latest preprints and breakthrough coverage — then uses an LLM to emit a structured literature review as JSON: key themes, methodology patterns, research gaps, paper summaries, and a recommended reading order.
1. What you'll build
A Python agent that takes a research topic and produces a structured literature review:
- Searches for recent papers and academic resources on the topic
- Scrapes individual papers for abstracts, methodology, and key findings
- Gets a deep synthesis of the research landscape
- Pulls recent news about breakthroughs and new publications
- Uses an LLM to generate a structured literature review with gaps, themes, and reading order as JSON
2. Setup
pip install openai requests python-dotenv
Create a .env file with your two keys:
SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
3. Find papers on the topic
Start with /search. One call finds recent papers and academic blog posts; a second, narrower call hunts specifically for survey and review articles — the papers that already synthesize a subfield and give you the fastest way in.
import requests, os, json
SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"
def find_papers(topic: str, year_filter: str | None = None) -> list[dict]:
"""Search for research papers and academic resources."""
query = f"{topic} research paper study"
if year_filter:
query += f" {year_filter}"
r = requests.get(
f"{BASE}/search",
params={"q": query, "limit": 10},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
return r.json().get("results", [])
def find_survey_papers(topic: str) -> list[dict]:
"""Find survey/review papers that synthesize existing work."""
r = requests.get(
f"{BASE}/search",
params={"q": f"{topic} survey review systematic literature", "limit": 5},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
return r.json().get("results", [])
4. Scrape paper details
/scrape turns each paper URL into clean, LLM-ready markdown. This is where the agent pulls the abstract, authors, methodology, and key findings out of the page so the LLM has real content to summarize instead of just a title.
def scrape_paper(url: str) -> dict:
"""Scrape a paper page for abstract, authors, and key findings."""
r = requests.get(
f"{BASE}/scrape",
params={"url": url},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
data = r.json()
return {
"url": url,
"title": data.get("title", ""),
"content": data.get("markdown", "")[:2500],
}
5. Deep research synthesis
/research pulls multiple sources into a single synthesis of the field — the state of the art, common methodologies, and where the open questions sit. This is the grounding context that lets the LLM write a review that reads like someone who already knows the area.
def synthesize_topic(topic: str) -> str:
"""Get a multi-source synthesis of the research topic."""
r = requests.get(
f"{BASE}/research",
params={
"q": f"{topic} research methodology findings state of the art",
"pages": 6
},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
data = r.json()
return data.get("synthesis", data.get("markdown", ""))[:3000]
6. Get recent breakthroughs and news
/news surfaces what just happened — new preprints, conference announcements, and breakthrough coverage. This is the time-sensitive layer a static survey paper can't give you, and it keeps the review current.
def get_research_news(topic: str) -> list[dict]:
"""Get recent breakthroughs, new papers, and conference coverage."""
r = requests.get(
f"{BASE}/news",
params={"q": f"{topic} research breakthrough paper 2025", "count": 6},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
return r.json().get("articles", [])
7. Generate the literature review with an LLM
Now hand everything to the LLM. The system prompt pins the output to the reader's level and forbids reporting anything the sources don't support — no invented citations or findings. The output is structured JSON so it slots straight into a doc, a notebook, or a reference manager.
from openai import OpenAI
llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def generate_literature_review(
topic: str,
synthesis: str,
papers: list[dict],
survey_papers: list[dict],
news: list[dict],
review_params: dict
) -> dict | None:
"""Generate a structured literature review."""
paper_summaries = "\n".join(
f"- [{p['title'][:80]}]({p['url']}): {p['content'][:400]}"
for p in papers[:6]
if p.get("content")
)
survey_summaries = "\n".join(
f"- [{s['title'][:80]}]({s['url']})"
for s in survey_papers[:3]
)
news_text = "\n".join(
f"- {n.get('title', '')} ({n.get('source', '')})"
for n in news[:5]
)
depth = review_params.get("depth", "graduate")
angle = review_params.get("angle", "general overview")
response = llm.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": f"""You are an academic research assistant helping a {depth}-level researcher.
Write a structured literature review. Be specific and accurate.
Only report findings that are clearly supported by the provided sources.
Use academic language appropriate for a {depth} audience.
Target angle: {angle}"""
},
{
"role": "user",
"content": f"""Write a literature review on: {topic}
Research Synthesis:
{synthesis[:2000]}
Papers Found:
{paper_summaries}
Survey/Review Papers:
{survey_summaries}
Recent News & Breakthroughs:
{news_text}
Return JSON with:
- topic: string
- scope_statement: string (1 sentence — what this review covers and excludes)
- key_themes: list of 4-5 strings (major themes/threads in the literature)
- methodology_patterns: list of 3 strings (common research approaches used in this field)
- paper_summaries: list of objects, each with:
- title: string
- url: string
- key_contribution: string (1 sentence)
- methodology: string (brief)
- relevance: "high" | "medium" | "low"
- research_gaps: list of 3-4 strings (open questions and underexplored areas)
- recommended_reading_order: list of strings (titles, ordered from foundational to specialized)
- recent_developments: list of 2-3 strings (from the news)
- suggested_next_searches: list of 3 strings (adjacent topics worth investigating)"""
}
],
response_format={"type": "json_object"}
)
try:
return json.loads(response.choices[0].message.content)
except (json.JSONDecodeError, KeyError):
return None
8. Wire up the full pipeline
The orchestrator runs the steps in order: find papers and surveys, scrape the top results for detail, synthesize the landscape, pull recent news, and hand the whole bundle to the LLM. A review_params dict tunes the depth, angle, and recency in one place.
def research_topic(
topic: str,
review_params: dict | None = None,
max_papers: int = 6
) -> dict | None:
"""
Run the full literature review pipeline.
review_params: {
"depth": "undergraduate" | "graduate" | "expert",
"angle": "technical implementation" | "survey of methods" | "practical applications",
"year_filter": "2024 OR 2025" # optional recency filter
}
"""
if review_params is None:
review_params = {"depth": "graduate", "angle": "general overview"}
print(f"Researching: {topic}")
year_filter = review_params.get("year_filter")
# Find papers
print("Searching for papers...")
papers_raw = find_papers(topic, year_filter)
survey_papers = find_survey_papers(topic)
# Scrape top papers for details
print(f"Scraping {min(len(papers_raw), max_papers)} papers...")
papers = []
for result in papers_raw[:max_papers]:
paper = scrape_paper(result["url"])
if paper["content"]:
papers.append(paper)
# Deep synthesis
print("Synthesizing research landscape...")
synthesis = synthesize_topic(topic)
# Recent news
print("Fetching recent breakthroughs...")
news = get_research_news(topic)
# Generate review
print("Generating literature review...")
return generate_literature_review(
topic, synthesis, papers, survey_papers, news, review_params
)
def print_review(review: dict):
if not review:
print("Could not generate review.")
return
print(f"\n{'='*60}")
print(f"Literature Review: {review.get('topic', 'Topic')}")
print(f"{'='*60}")
print(f"\nScope: {review.get('scope_statement', '')}\n")
themes = review.get("key_themes", [])
if themes:
print("Key Themes:")
for t in themes:
print(f" * {t}")
gaps = review.get("research_gaps", [])
if gaps:
print("\nResearch Gaps:")
for g in gaps:
print(f" ? {g}")
summaries = review.get("paper_summaries", [])
if summaries:
print("\nKey Papers:")
for p in summaries[:4]:
rel = p.get("relevance", "?")
print(f" [{rel.upper()}] {p.get('title', '')[:70]}")
print(f" {p.get('key_contribution', '')[:100]}")
reading_order = review.get("recommended_reading_order", [])
if reading_order:
print("\nReading Order (foundational -> specialized):")
for i, title in enumerate(reading_order, 1):
print(f" {i}. {title[:70]}")
developments = review.get("recent_developments", [])
if developments:
print("\nRecent Developments:")
for d in developments:
print(f" ! {d}")
next_searches = review.get("suggested_next_searches", [])
if next_searches:
print("\nAlso worth exploring:")
for s in next_searches:
print(f" -> {s}")
if __name__ == "__main__":
import sys
topic = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else "retrieval-augmented generation"
PARAMS = {
"depth": "graduate",
"angle": "technical implementation and practical applications",
"year_filter": "2024 OR 2025",
}
review = research_topic(topic, PARAMS, max_papers=6)
if review:
print_review(review)
9. What you can research
- Dissertation literature review — cover a new subfield before your research proposal; get oriented in hours instead of weeks.
- Grant proposal background — quickly map the landscape and identify research gaps to position your contribution.
- Paper writing — use
suggested_next_searchesto find adjacent literature you should cite. - Weekly research digest — run on a cron job to track new publications in your area; diff
recent_developmentsweek-over-week. - Onboarding to a new field — set
depth: "undergraduate"for an accessible overview before diving into primary sources.
10. Extending the agent
- Citation graph — scrape each paper's references section and build a graph to find the most-cited foundational papers.
- Author tracking — run
find_papers()filtered by author name to get a researcher's full publication history. - arXiv integration — add
/search?q={topic}+site:arxiv.orgto bias toward preprints for cutting-edge results. - Export to Zotero/BibTeX — pipe
paper_summariesthrough a BibTeX formatter for direct citation-manager import.
11. Getting your API key
Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.
For related builds, the content research agent covers the deep-research synthesis pattern in more depth, and the financial research agent uses the same four-endpoint pattern on companies and markets.