Build a lead generation agent
Manual prospecting is the slowest part of B2B sales: searching for companies that fit your ideal customer profile, digging through each website for a contact and a hook, and copying it all into a spreadsheet. This guide builds a Python agent that does the whole loop automatically. It chains /search (find companies matching an ICP), /scrape (pull contact info, team, and product details off each site), and /research (deep company background), then uses an LLM to score fit and emit a CRM-ready record. Output is a CSV of qualified leads you can import into HubSpot or Salesforce.
1. What you'll build
A Python agent that:
- Takes an ideal customer profile (ICP): industry, company size, location, keywords
- Searches for companies matching that profile using
/search - Scrapes each company's website to extract contact info, team, products, and a company description
- Uses
/researchfor deeper background on promising leads - Uses an LLM to score and structure each lead into a JSON record
- Outputs a CSV/JSON of qualified leads ready for CRM import
2. Setup
pip install openai requests python-dotenv
Create a .env file with your two keys:
SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
3. Search for companies matching your ICP
Start by turning your ICP into a search query. /search returns a list of candidate companies — the building block for the whole pipeline.
import requests, os, json
SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")
BASE = "https://superhighway.walls.sh"
def search_leads(
industry: str,
role: str = "founder OR CEO OR CTO",
location: str = "",
keywords: str = ""
) -> list[dict]:
"""Find companies in a target industry."""
query = f"{industry} company {keywords} {location}".strip()
r = requests.get(
f"{BASE}/search",
params={"q": query, "limit": 10},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
return r.json().get("results", [])
4. Scrape company website for contact and team info
/scrape returns each page as clean Markdown — no nav, scripts, or cookie banners. We sweep the common pages where contact details and team bios live, and truncate each so the prompt stays small.
def scrape_company(url: str) -> dict:
"""Scrape the company homepage and /about page."""
pages = {}
for path in ["", "/about", "/team", "/contact"]:
target = url.rstrip("/") + path
r = requests.get(
f"{BASE}/scrape",
params={"url": target},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
data = r.json()
if data.get("markdown"):
pages[path or "home"] = data["markdown"][:2000]
return pages
5. Deep research on promising leads
For candidates that look like a fit, /research pulls multi-source background — funding, team size, product positioning — that rarely lives on the company's own site. This is the context that makes outreach personal.
def research_company(company_name: str) -> str:
"""Get deep background on a company."""
r = requests.get(
f"{BASE}/research",
params={
"q": f"{company_name} company funding team product",
"pages": 4
},
headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
)
data = r.json()
return data.get("synthesis", data.get("markdown", ""))[:3000]
6. Score and structure the lead with an LLM
Now the LLM reads the scraped pages plus the research and emits a structured lead record — fit score, contact, key person, pain points, and a suggested next action — all judged against your ICP. A JSON schema keeps the output CRM-ready.
from openai import OpenAI
llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
LEAD_SCHEMA = {
"type": "object",
"properties": {
"company_name": {"type": "string"},
"website": {"type": "string"},
"description": {"type": "string"},
"industry": {"type": "string"},
"estimated_size": {"type": "string", "enum": ["1-10", "11-50", "51-200", "201-1000", "1000+", "unknown"]},
"contact_email": {"type": "string"},
"key_person": {"type": "string"},
"key_person_role": {"type": "string"},
"fit_score": {"type": "integer", "minimum": 1, "maximum": 10},
"fit_reason": {"type": "string"},
"pain_points": {"type": "array", "items": {"type": "string"}},
"next_action": {"type": "string"}
},
"required": ["company_name", "website", "description", "fit_score", "fit_reason"]
}
def score_lead(url: str, pages: dict, research: str, icp: dict) -> dict | None:
"""Use an LLM to score and structure a lead based on ICP fit."""
pages_text = "\n\n".join(
f"### {page_name}\n{content}"
for page_name, content in list(pages.items())[:3]
)
response = llm.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": f"""You are a B2B sales researcher. Evaluate this company as a potential lead for our ideal customer profile (ICP):
ICP:
- Industry: {icp.get("industry", "any")}
- Target role: {icp.get("role", "any")}
- Keywords: {icp.get("keywords", "any")}
Score fit from 1-10. Extract contact info if present. Identify pain points relevant to our ICP."""
},
{
"role": "user",
"content": f"Company URL: {url}\n\nWebsite content:\n{pages_text}\n\nResearch:\n{research}"
}
],
response_format={"type": "json_object"}
)
try:
lead = json.loads(response.choices[0].message.content)
lead["website"] = url
return lead
except (json.JSONDecodeError, KeyError):
return None
7. The full pipeline
Wire the steps together: search for candidates, scrape and research each one, score it, and keep only leads above your fit threshold.
import csv
from datetime import datetime
def generate_leads(
icp: dict,
min_fit_score: int = 6,
max_leads: int = 20
) -> list[dict]:
"""
icp: {
"industry": "SaaS startups",
"role": "CTO OR VP Engineering",
"location": "San Francisco",
"keywords": "developer tools"
}
"""
print(f"Searching for leads: {icp}")
# Step 1: Find candidate companies
results = search_leads(
icp["industry"],
icp.get("role", ""),
icp.get("location", ""),
icp.get("keywords", "")
)
print(f"Found {len(results)} candidates")
leads = []
for result in results[:max_leads]:
url = result["url"]
company_name = result.get("title", url)
print(f" Processing: {company_name}")
# Step 2: Scrape company pages
pages = scrape_company(url)
# Step 3: Research (only for high-potential leads)
research = ""
if pages:
research = research_company(company_name)
# Step 4: Score and structure
lead = score_lead(url, pages, research, icp)
if lead and lead.get("fit_score", 0) >= min_fit_score:
leads.append(lead)
print(f" Qualified (score: {lead['fit_score']}/10)")
else:
print(f" Skipped (low fit or no data)")
return leads
def save_leads(leads: list[dict], filename: str = "leads.csv"):
if not leads:
print("No qualified leads found.")
return
with open(filename, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=leads[0].keys(), extrasaction="ignore")
writer.writeheader()
writer.writerows(leads)
print(f"\nSaved {len(leads)} qualified leads to {filename}")
if __name__ == "__main__":
ICP = {
"industry": "SaaS developer tools",
"role": "CTO OR VP Engineering",
"location": "San Francisco Bay Area",
"keywords": "API platform infrastructure"
}
leads = generate_leads(ICP, min_fit_score=6, max_leads=15)
save_leads(leads)
# Print summary
for lead in leads:
print(f"\n{lead.get('company_name', 'Unknown')} — Score: {lead.get('fit_score')}/10")
print(f" {lead.get('description', '')[:100]}")
print(f" Contact: {lead.get('contact_email', 'not found')}")
print(f" Next action: {lead.get('next_action', '')}")
8. Extending the agent
- Multi-query sweep — run multiple ICP search queries and deduplicate by domain to widen the funnel.
- News trigger — use
/newsto find companies that recently raised funding or launched a product; those are high-intent triggers for outreach. - LinkedIn enrichment — add a
/searchstep to find the LinkedIn profile of the key person. - Scheduling — run it weekly on GitHub Actions to build a growing lead pipeline.
- CRM sync — emit JSON formatted for HubSpot or Salesforce import instead of CSV.
9. Responsible use
This agent is for researching publicly available company information — the same information visible in any web browser. Use it to research companies, not individuals, and always comply with the terms of service of the sites you scrape.
10. Getting your API key
Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.
From here, the competitor analysis guide goes deeper on profiling companies, and the news briefing guide shows how to wire /news triggers into an outreach workflow.