← Docs Docs

The Reputation Network

A bot that gets caught attacking one publisher should not get a fresh start on the next one. That is the entire idea behind the reputation network: every site running PubSentry contributes to, and reads from, a shared memory of which entities have behaved like invalid traffic. An entity flagged on one site arrives pre-flagged on every other. This is the collective-defense moat — and it gets stronger with every site and every request, which is exactly why it cannot be copied by a competitor without the same traffic underneath it.

This page explains what the network actually stores, how an entity's reputation is computed and decayed over time, the deliberate false-positive guards built into it, and — honestly — where it is mature versus where it is still roadmap.

If you have not installed yet, it is one async tag:

<script async src="https://pubsentry.com/t.js" data-site="st_xxxx"></script>

What an "entity" is

Reputation attaches to entities, not to people. PubSentry currently tracks reputation for two entity types on the hot path:

  • Device fingerprint — a per-device identifier the tag derives in the browser (fnv1a hashing of stable environment properties, not personal data). This is the strongest signal because it is specific to one browser/device.
  • IP — keyed as an HMAC-SHA256 hash of the address. The raw IP is never stored; the reputation key is a one-way hash, so the address cannot be recovered from the network record.

The data contract also defines asn, referrer, and visitor entity types for future use, but device fingerprint and IP are what carry reputation today.

A reputation record is small and intentionally non-identifying: the entity type and hashed key, a 0-100 score, the count of distinct sites that have seen it, first/last-seen timestamps, and a set of discrete flags such as datacenter_ip, known_bot, honeypot_history, or high_velocity. There is no raw IP, no User-Agent, no page URL, and no user identity in the record.

How a score is built and aged

Every scored visitor folds an observation into that entity's running reputation. The math is deliberately conservative.

New observations blend in with an EWMA. A new score is mixed into the prior with an exponential weight (the latest observation counts for 40%), so a single odd request does not swing an entity's standing — and a single clean request does not launder a known-bad one.

Strong evidence is not allowed to be diluted. When an observation is itself strong (score ≥ 90 — the unambiguous, hard-rule tells), the engine takes the max with the prior rather than averaging it down. Certainty earned once is not erased by a quieter follow-up visit.

Reputation decays toward "unknown." This is the most important guard against permanent false positives. A score halves every 14 days of inactivity (an exponential half-life applied on read). IPs are recycled and devices are reformatted; without decay, a single old flag would block an innocent user forever. With it, a stale flag fades and a recycled IP or reformed entity recovers on its own. Records also carry a 90-day TTL, so the network forgets entities that stop showing up.

How the network changes a verdict

Cross-site reputation is blended into scoring at the moment a verdict is computed, and the rule is strictly one-directional: reputation can only raise a score, never lower it. A clean-looking request from a known-bad entity is still suspect, so the engine takes the max of the locally-computed score and the network score, and records a cross_site_reputation reason citing how many sites flagged the entity. Reputation never argues a visitor down into being allowed — the local evidence already did that.

There is one essential false-positive guard in this blend, and it is worth understanding precisely:

  • Single-site IP reputation is capped to the monitor band. Because IPs are shared (an office, a campus, a carrier-grade NAT, a coffee shop), one site's flag on an IP must not hard-block everyone else on that same IP. So an IP flagged by only one site can warn but not block on its own — its contribution is clamped (to ≤ 70, the monitor range). It takes corroboration from two or more sites before an IP acts at full strength.
  • Device fingerprint reputation acts at full strength immediately. A fingerprint is per-device, not shared, so there is no innocent-bystander risk. A device flagged anywhere on the network is treated as flagged everywhere.

That corroboration requirement is the moat in one sentence: the more sites participate, the faster a shared-IP bad actor crosses from "monitored" to "blocked" — without risking a real human on a shared address.

The edge fast path

For sites fronted by the edge worker, the authoritative reputation store (owned by the ingestion service) is mirrored to an edge KV cache so a globally-distributed lookup returns in under ~50ms. The edge only ever reads the cache — ingest owns the writes — and a cache miss or a malformed record is treated as "unknown" and fails open (the visitor is allowed). The edge never blocks because it could not reach reputation.

In the current self-hosted deployment, the edge and ingest share the same Redis, so no external publish is needed. A Cloudflare KV publisher is wired in for the distributed-edge configuration, activated only when Cloudflare credentials are present.

Benchmarking your site against the network

Beyond per-entity lookups, the network powers a site-level benchmark: how your invalid-traffic rate compares to every other active site. This reads through /v1/network/benchmark and surfaces in the dashboard's Reputation Network screen.

Honesty is built into the math here, too. A percentile ranking is meaningless on a thin network, so the benchmark is explicitly flagged provisional until there are at least 20 sites in the sample. Below that, the dashboard shows the comparison but refuses to claim a false-precision percentile — we would rather show a provisional number than a confident wrong one.

What is built versus what is coming

In the spirit of radical honesty:

  • Built and live: cross-site reputation reads and writes on the hot path for device fingerprints and IPs; time-decay and TTL; the single-site IP cap and full-strength fingerprint rule; the edge read cache; and the network lookup (/v1/network/lookup) and benchmark (/v1/network/benchmark) endpoints behind the authenticated dashboard.
  • Coming: the network is most powerful at scale, and PubSentry is early — many entities are still unknown simply because few sites have seen them yet. ASN- and referrer-level reputation are defined in the schema but not yet on the hot path. A public reputation API and outbound webhooks for programmatic lookups are roadmap, not shipped — today reputation is consumed through the scoring engine and the dashboard, not a standalone developer API.
  • An honest limit: IP network classification comes from an offline iptoasn dataset, which sees datacenter and hosting origins but does not identify residential proxies, consumer VPNs, or Tor by IP alone. A paid IP-intelligence feed behind the same interface is the upgrade path.

The reputation network does not let us claim "100%." It does something more durable: every attack on any publisher makes the defense for all of them a little sharper, and it does so without ever trading away the first principle — never block a real human.

Stop invalid traffic before the ad fires. Score every visitor, block the invalid ones pre-serve, protect your account. Free for your first 500 pageviews.
Start free →