The Engine

The hard infrastructure that makes anything live

The Engine is the reusable platform underneath every app. One clean, deduplicated stream of genuinely new content, from any source. Point it at a live source like Bluesky, a feed-based RSS or JSON feed, or a feed-less HTML page we diff, and it auto-detects the format and normalizes the rest. There is no curated source list, just an engine built to handle whatever you point it at. Here is what runs under the hood.

Connectors

Every protocol.
One stream.

We speak the open web's real protocols natively: ATProto and Jetstream, RSS, Atom and JSON feeds, WebSocket streams, SSE endpoints. Those carry every kind of source, whether it's an always-on push source, a feed we poll, or a feed-less HTML page we diff. Each source has a purpose-built connector that normalizes into one shared event shape.

real-time · seconds

ATProto / Jetstream

Any ATProto app, in plain JSON.

We speak ATProto natively. Bluesky's Jetstream is the primary source: no DAG-CBOR, no CAR decoding, no ceremony. Any app built on the AT Protocol flows through the same connector.

polled · minutes

RSS / Atom / JSON Feed

Every feed format, one shape.

Universal parsing across RSS 2.0, Atom 1.0, and JSON Feed, polled on a polite adaptive schedule. Hot feeds get checked often; quiet ones back off. You never see the same item twice.

diffed · minutes

HTML

No feed? No problem.

Plenty of the web's most valuable pages have no feed and no API at all. We extract the main content, fingerprint it, and surface genuinely new content, so the sources that matter most aren't the ones you miss.

real-time · seconds

Server-Sent Events

Any SSE stream, live and resumable.

One-way server push over plain HTTP. Reconnects resume exactly where you left off, no gaps. Wikimedia EventStreams is wired in by default. Point us at any SSE endpoint and it flows.

real-time · seconds

WebSockets

Any WebSocket live feed.

Bidirectional streaming from any WebSocket-based API. Mastodon instances, financial data streams, live dashboards. If it speaks WebSockets, we can listen. Backfill polling closes any gaps a disconnect leaves.

coming soon

Whole-Domain Crawl

A whole site, not just a page.

Add an entire domain and we'll crawl it politely, learn its shape, and surface genuinely-new content across every page as one stream.

Build straight on the raw feed. Subscribe to the filtered, normalized stream over plain HTTPS and ship whatever you want on top, no app in the middle.

How it flows

From a URL to a live event

Every source, whatever its shape, travels the same five steps into one stream.

DETECT
which format a source speaks, decided the moment you add it.
FETCH
politely, on a schedule that adapts to how often it actually changes.
EXTRACT
the article, not the chrome around it.
COMPARE
a real edit, not a reloaded timestamp.
DELIVER
one event, one schema, over a stream you can resume.
an example event
{
  "source_type": "html",
  "url": "https://example.org/notice/142",
  "title": "Procurement notice #142",
  "published_at": "2026-07-02T14:03:00Z",
  "content_hash": "e91a..."
}
Under the hood

Six systems doing the hard part

INGESTION Getting to the content without getting in anyone's way.
01
CRAWLER

Polite by construction

  • A conditional-fetch layer detects an unchanged page before a single byte of content downloads.
  • Thousands of hosts crawled concurrently over multiplexed HTTP/2, each held to its own pace.
  • Crawl compliance and backoff are enforced automatically, with zero manual tuning.
02
SCHEDULER

One heap, thousands of sources

  • A single priority-queue scheduler tracks every source at once, scaling to thousands with no added overhead.
  • Polling cadence self-tunes continuously to each source's real rhythm, no configuration required.
  • A cost-aware probe rules out the unchanged before committing to a full fetch, with circuit breakers isolating failures automatically.
PROCESSING Turning a fetch into a genuine, deduplicated change.
03
EXTRACTION

Clean text, every time

  • A readability-class extraction engine isolates true content from everything surrounding it.
  • Runs entirely in-process, at native speed, with no external service in the loop.
  • Every downstream system reasons over one canonical version of the truth.
04
CHANGE DETECTION

Real edits, not noise

  • An instant fingerprint check rules out true duplicates in constant time.
  • A simhash-based fuzzy-fingerprinting model tells a genuine edit from cosmetic noise.
  • Listings and index pages are diffed item by item, so nothing gets lost in the noise.
DELIVERY Getting the event to you, exactly once, resumably.
05
DURABILITY

Never lose an event

  • Every event is durably committed before the system ever considers it handled.
  • A deterministic identity scheme makes every replay safe, automatically.
  • Backpressure propagates end to end; nothing is ever held in memory alone.
06
STREAM API

One call, one schema

  • A single request opens a persistent, live stream over HTTP/2, no polling required.
  • A resumable cursor guarantees exact continuity across any disconnect.
  • Every event, from any source, arrives in one stable, versioned schema.
Reliability

Runs itself, safely

The Engine is built to keep running unattended, without dropping a thing.

Per-source isolation

A hung or failing source never delays or blocks any other.

Adaptive, not naive

Quiet sources are checked less, busy ones more, without you tuning a thing.

Resumable everywhere

A restart or a dropped connection picks up exactly where it left off.

One schema, every source

Push, feed, or diffed HTML, the event you receive looks the same.

In practice

One request, three kinds of sources

Every source ends up in the same stream. Here is what actually happens for each kind.

PUSH

A push protocol delivers an event the moment it happens, over Bluesky's firehose, an SSE stream, or a WebSocket.

Detect format Normalize Dedupe Stream it (seconds)
FEED

An RSS, Atom, or JSON feed you are polling gets a new entry.

Poll on schedule Parse entry Dedupe by guid Stream it (minutes)
HTML PAGE

A page with no feed changes.

Fetch Extract content Fingerprint Diff Stream it (minutes to tens of minutes)
Integrations

Deliver the stream anywhere you already work

The same normalized event reaches you through whichever transport fits, from a raw HTTP stream you subscribe to in one call to the chat and infrastructure your team already runs.

HTTP

Webhooks

HMAC-signed push

Every event is pushed to your endpoint as JSON, HMAC-signed so you can verify it, with automatic retries on failure. This is the developer-native path.

Stream

SSE streaming

One HTTP call

Open a live server-sent stream with a single HTTP request, filter to the slice you want, and resume from a cursor after any disconnect with no gaps.

Chat

Chat delivery

Slack, Teams, Discord

Matches land straight in the channels your team watches. Route any stream into Slack, Microsoft Teams, and Discord without building a bot.

Chat destinations
Queue

Message queues

Kafka, SQS-style

Write events onto Kafka and SQS-style queues so they flow directly into your own pipelines, workers, and data stores at your own pace.

Native

Native integrations

Built into your stack

Connect the stream into the products and platforms you already run, so new content shows up where your workflows live without glue code.

Every source you add flows through these six systems, then out as one clean stream you can build on. See how the same care extends to security and trust.