Build a web change detection agent

Superhighway guides

RSS feeds are dying, but the need to know when a webpage changes is not. Competitor pricing, job boards, API changelogs, terms of service — all of it lives on pages that quietly update with no notification. This guide builds a small Python agent that watches any list of URLs, detects when their content changes, and uses an LLM to tell you what changed in plain English. The trick that makes it reliable: instead of diffing raw HTML, it diffs the clean Markdown that Superhighway's /scrape endpoint returns, so cosmetic UI noise doesn't trigger false alarms.

1. What you'll build

A self-contained Python agent that:

Scrapes a list of target URLs using Superhighway's /scrape endpoint
Stores a hash of the last-seen content for each URL
Detects changes by comparing the new hash against the stored one
When something changes, diffs old vs. new and uses an LLM to summarize it
Sends an alert (console, file, or Slack webhook) on a significant change
Runs unattended on a cron job or scheduled GitHub Action

2. Setup

Install the three dependencies:

pip install requests openai python-dotenv

Create a .env file with your two keys:

SUPERHIGHWAY_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

3. Scrape a URL to Markdown

The core building block is a function that hits /scrape and returns the page as clean Markdown:

import requests, os, hashlib
from dotenv import load_dotenv

load_dotenv()
SUPERHIGHWAY_KEY = os.getenv("SUPERHIGHWAY_API_KEY")

def scrape(url: str) -> str:
    r = requests.get(
        "https://superhighway.walls.sh/scrape",
        params={"url": url},
        headers={"Authorization": f"Bearer {SUPERHIGHWAY_KEY}"}
    )
    r.raise_for_status()
    return r.json().get("markdown", "")

The response is LLM-ready Markdown with the navigation, scripts, and ads stripped out — only the actual content remains:

content = scrape("https://example.com/pricing")
print(content[:500])  # Clean Markdown, ready for LLM

4. Hash and compare content

To detect a change without storing entire pages, hash the content and compare hashes. The first time we see a URL we just record its hash as a baseline:

def content_hash(text: str) -> str:
    return hashlib.sha256(text.encode()).hexdigest()

def has_changed(url: str, stored: dict[str, str]) -> tuple[bool, str]:
    content = scrape(url)
    new_hash = content_hash(content)
    old_hash = stored.get(url)

    if old_hash is None:
        stored[url] = new_hash
        return False, content  # First run, establish baseline

    if new_hash != old_hash:
        stored[url] = new_hash
        return True, content

    return False, content

Because the input is normalized Markdown, the hash only flips when the meaningful content changes — not when a tracking script or an ad slot shuffles around in the raw HTML.

5. Summarize what changed with an LLM

A hash tells you that something changed but not what. Feed the old and new versions to an LLM and ask for a short summary. gpt-4o-mini is fast and cheap enough to run on every detected change:

from openai import OpenAI

llm = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def summarize_change(url: str, old_content: str, new_content: str) -> str:
    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a monitoring agent. Summarize what changed on a webpage in 2-3 sentences. Focus on the most important changes."},
            {"role": "user", "content": f"URL: {url}\n\nPREVIOUS VERSION:\n{old_content[:2000]}\n\nNEW VERSION:\n{new_content[:2000]}\n\nWhat changed?"}
        ]
    )
    return response.choices[0].message.content

Truncating each version to 2,000 characters keeps the prompt small while still capturing the lead content where most meaningful changes (pricing, headlines, new sections) appear.

6. The full monitoring loop

Now wire it together. State persists to a JSON file so the agent remembers baselines across runs — we store both the hash (for detection) and a truncated copy of the content (so we can diff against it next time):

import json
from pathlib import Path

URLS_TO_MONITOR = [
    "https://example.com/pricing",
    "https://example.com/changelog",
]

STATE_FILE = Path("monitor_state.json")

def load_state() -> dict[str, str]:
    if STATE_FILE.exists():
        return json.loads(STATE_FILE.read_text())
    return {}

def save_state(state: dict[str, str]) -> None:
    STATE_FILE.write_text(json.dumps(state, indent=2))

def alert(url: str, summary: str) -> None:
    print(f"\n🔔 CHANGE DETECTED: {url}")
    print(f"Summary: {summary}")
    # Add Slack/email hook here

def run_check(urls: list[str], state: dict) -> None:
    for url in urls:
        prev_content = state.get(f"content:{url}", "")
        changed, new_content = has_changed(url, state)

        if changed and prev_content:
            summary = summarize_change(url, prev_content, new_content)
            alert(url, summary)

        state[f"content:{url}"] = new_content[:5000]  # Store truncated content for diff
    save_state(state)

if __name__ == "__main__":
    state = load_state()
    run_check(URLS_TO_MONITOR, state)
    print("Check complete.")

The first run establishes baselines silently. Every run after that, any URL whose content has changed gets diffed, summarized, and alerted on.

7. Run on a schedule

A change detector is only useful if it runs itself. The simplest option is cron on any always-on machine:

# Check every 30 minutes
*/30 * * * * cd /path/to/project && python monitor.py >> monitor.log 2>&1

If you'd rather not keep a machine running, schedule it as a GitHub Action. Commit the script (and its monitor_state.json so baselines persist), add your two keys as repository secrets, and drop this in .github/workflows/monitor.yml:

name: Web Change Detection
on:
  schedule:
    - cron: '0 * * * *'
  workflow_dispatch:

jobs:
  monitor:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install requests openai python-dotenv
      - run: python monitor.py
        env:
          SUPERHIGHWAY_API_KEY: ${{ secrets.SUPERHIGHWAY_API_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

The workflow_dispatch trigger lets you run it manually from the Actions tab to test, while the cron schedule handles the hourly check. To persist state across runs in CI, commit the updated monitor_state.json back to the repo or stash it in the Actions cache.

8. Adding Slack alerts

Console output is fine for testing, but for a real monitor you want a push notification. An incoming Slack webhook takes a few lines with no extra dependencies:

import urllib.request, json as _json

def alert_slack(url: str, summary: str, webhook_url: str) -> None:
    payload = {"text": f"*Change detected:* {url}\n{summary}"}
    req = urllib.request.Request(
        webhook_url,
        data=_json.dumps(payload).encode(),
        headers={"Content-Type": "application/json"},
        method="POST"
    )
    urllib.request.urlopen(req)

Swap the print calls in alert() for a call to alert_slack() with your webhook URL, and changes land directly in a channel.

9. What to monitor

Anything that updates without telling you is a candidate:

Competitor pricing pages — know the moment a tier or price moves
Job board listings — catch new roles as they post
Docs and changelogs — get notified of new API releases
Legal and terms-of-service pages — track policy changes that affect you
Product landing pages — spot new features and positioning shifts
Any public page without an RSS feed

10. Why /scrape instead of raw requests

You could fetch pages with plain requests.get(), but the diff would be unusable:

Raw requests return full HTML — scripts, ad slots, navigation, tracking pixels, CSRF tokens. Most of that changes on every load, so a hash of the HTML flips constantly and drowns real changes in noise.
Superhighway's /scrape returns clean Markdown — just the actual content. A UI refresh or a new analytics script won't move the hash, only a real content change will.

That difference is what makes the hash-based diff reliable enough to alert on. The LLM summarization step also gets a far cleaner input, so its summaries describe what a human would actually care about.

11. Getting your API key

Grab a free Superhighway key at /pricing (1,000 calls/month, no credit card). For an agent that provisions its own access, skip the key entirely with x402: it pays $0.002 per call in USDC on Base — no signup, no key management. See the x402 pay-per-call guide for the wallet setup.

From here, the news briefing agent guide builds a similar scheduled agent on the /news endpoint, and the Groq guide shows how to swap in a faster, cheaper model for the summarization step.