The Freshness Moat: Freshness Is Truth Distribution (Signal Observatory Edition)
Freshness is not rewriting content. Freshness is verifiable state + disciplined signals + observability — so your updates trigger recrawl and you can prove propagation happened.
The web has a language problem:
It can publish text. It can’t publish state.
In a retrieval world, that’s fatal.
Because “freshness” is no longer a vibe. It’s a measurable distribution property: did the systems that matter re-fetch your updated truth?
We can verify some crawlers; for the rest, we measure behavior and propagation patterns.
The Old Web Can’t Express State
If your knowledge can’t emit deltas, you don’t have knowledge—you have archaeology.
Static pages are great for humans, but they do a terrible job at communicating “what changed” to machines. A crawler doesn’t want your new paragraph. It wants to know:
- what changed
- when it changed
- which machine surfaces should be revalidated
- whether the update is material
Without that, “freshness” becomes date theater: a constant churn of timestamps with no measurable recrawl.
Non-Negotiable Definitions (Use These Words)
A) Principle Pages vs. Volatile Pages
Principle Pages are stable: timeless architecture, frameworks, positioning, long-lived methods. They should rarely change.
Volatile Pages are stateful: model/tool versions, pricing, policies, compliance, SLAs, supported regions, legal constraints. They must be continuously validated.
Example: Argbe.tech can keep a principle page like “How we do GEO” stable, while volatile claims like supported markets (DACH, United States) must stay current.
B) Material Update
A material update changes one of:
- main content meaning (not cosmetic edits)
- structured data / JSON-LD
- key links that function as evidence (anchors, registries, audit packets)
- claim validity (pricing model, eligibility, scope, availability)
Google’s examples of “significant modification” include changes to main content, structured data, and links; changing a copyright date is not significant 1 .
Google also notes it only uses <lastmod> when it’s consistently and verifiably accurate 1 .
C) Propagation
Propagation is the time between shipping a material update and verified re-fetch of the updated URLs (including your machine surfaces).
D) Propagation Half-Life (PHL)
Propagation Half-Life (PHL) is a metric:
“How quickly do key fetchers re-fetch updated surfaces after an UpdateEvent?”
Diagram 1 — The Freshness Moat Triad (With Feedback Loop)
Freshness Isn’t Rewriting: It’s Re-Verification
Here’s the operational reframing:
- Evergreen principles should stay stable (you don’t rewrite your architecture every week).
- Verification state must evolve (you continuously confirm what’s still true).
So instead of “update the blog post,” you ship a Verification & Updates module that publishes accountable state:
- what is true now
- what changed
- when it was verified
- which surfaces were revalidated
A Minimal “Verification & Updates” Module (Public)
This can be a page section or a dedicated route. The point is not aesthetics — it’s machine-legible state.
{
"entity": "Argbe.tech",
"as_of": "2026-01-27",
"entity_version": "0.79.0",
"volatile_claims": {
"availability_regions": [
"DACH",
"United States"
],
"pricing_model": "Fixed weekly rate"
},
"materiality_rule": "Material update = meaning/structured-data/link/claim validity change; cosmetic edits do not count.",
"surfaces": {
"canonical_pages": [
"/",
"/contact",
"/geo-seo"
],
"machine_surfaces": [
"/entity.json",
"/llms.txt",
"/changes.json",
"/sitemap.xml"
]
}
}
Those volatile claims are already present in your golden record:
- Pricing model: Fixed weekly rate Fixed weekly rate
- Regions: DACH, United States ["DACH","United States"]
The Signal Stack (What Actually Moves Crawlers)
Your Signal Stack is not an SEO hack. It’s a cost-reduction and distribution layer.
PHL drops when revalidation becomes cheap (ETag) and change notification becomes explicit (IndexNow + honest sitemap lastmod) 3 4 1 .
1) HTTP caching validators (ETag + Last-Modified)
Use conditional requests to make revalidation cheap:
ETag+If-None-MatchLast-Modified+If-Modified-Since
Google explicitly recommends considering both ETag and Last-Modified; when both exist, Google uses ETag (and recommends it for efficient revalidation) 3 .
Why this matters for GEO: it turns “checking freshness” into a low-cost operation, increasing the likelihood systems will re-fetch your machine surfaces.
2) Sitemap lastmod discipline (only significant modifications)
If you push “fake freshness” via trivial edits and stamp new dates, you’re training crawlers to ignore you. lastmod should reflect significant modification, not footer changes 1 .
3) Push notification where relevant (IndexNow)
For participating engines, IndexNow provides a push mechanism to notify about URL changes (added/updated/deleted) 4 .
Be precise: IndexNow does not guarantee crawling or indexing; it shortens propagation when the ecosystem responds 15 13 .
4) Structured data timestamps (disciplined, not performative)
If a material update touches structured data, reflect it explicitly in JSON-LD:
- use
datePublishedwhen something is first published (where appropriate) - bump
dateModifiedonly when the update is material (same discipline as sitemaplastmod)
Keep semantics clean: “Last verified” can update on validation cycles without implying the page materially changed; “Last updated” should only move on material changes 16 .
Anti-Pattern Box — Date-Churn / Fake lastmod / Cosmetic Updates
| Anti-pattern | What it looks like | Why it fails | The fix |
|---|---|---|---|
| Date-churn freshness theater | Updating “Last updated” every week | Crawlers learn it’s noise | Tie lastmod to material updates only 1 |
| Cosmetic diffs | swapping adjectives, changing order | Doesn’t change claim validity | Publish state changes (deltas), not prose |
| “We updated!” with no signals | edits shipped, but no machine surfaces touched | revalidation never triggers | ETag/Last-Modified + sitemap discipline + ChangeFeed |
/llms.txt as the Agent Routing Table
Treat /llms.txt as a pragmatic entrypoint (not magic): a place to declare where your machine surfaces live.
The llms.txt format is an emerging proposal for helping LLMs and agents find and use websites at inference time 5 .
The interlock: Golden Record + Fan-Out + ChangeFeed
/llms.txt should route to:
- your golden record (
/entity.json) (canonical entity truth) - your fan-out exports (packet endpoints)
- your change feed (
/changes.json) (what changed + when)
Minimal example (/llms.txt):
# argbe.tech machine surfaces
Entity: https://argbe.tech/entity.json
Changes: https://argbe.tech/changes.json
Sitemap: https://argbe.tech/sitemap.xml
# Canonical pages (humans + embedded JSON-LD)
Pages:
- https://argbe.tech/contact
- https://argbe.tech/geo-seo
# Exports (Fan-Out)
Exports:
- https://argbe.tech/contact.json
ChangeFeed: Patch Notes for Truth
Agents and crawlers don’t want essays. They want deltas.
Your ChangeFeed is a public, append-only record of UpdateEvents:
- what changed
- when it changed
- why it changed (reason category)
- which surfaces were impacted
- whether it was material
What a ChangeFeed entry contains (fields + semantics)
| Field | Meaning |
|---|---|
id | stable event id (monotonic or UUID) |
observed_at | when the new state became canonical |
material | boolean per your materiality rules |
changed_paths | what fields changed in the golden record |
affected_urls | canonical pages + machine surfaces that should be revalidated |
evidence | optional URLs to proof artifacts (release tags, audit packets) |
Example (/changes.json), intentionally compact:
{
"version": "0.79.0",
"as_of": "2026-01-27",
"events": [
{
"id": "update-2026-01-19-001",
"observed_at": "2026-01-19T09:10:00Z",
"material": true,
"changed_paths": [
"pricing.model",
"markets.regions",
"meta.releases_repo"
],
"affected_urls": [
"/",
"/contact",
"/entity.json",
"/llms.txt",
"/changes.json",
"/sitemap.xml"
],
"evidence": [
"https://github.com/argbe-tech/releases/releases"
]
}
]
}
Metric Box — Propagation Half-Life (PHL)
PHL turns freshness into an operational KPI.
Define:
t0= time you ship a material UpdateEventR(t)= fraction of “key fetchers” that have re-fetched the affected surfaces by timet
PHL is the time where R(t) crosses 50%.
Practical targets (illustrative):
| Surface | Goal PHL | Why |
|---|---|---|
/entity.json + /changes.json | hours → 1 day | the machine truth should refresh fast |
money pages (/contact) | 1–3 days | humans care; citations follow |
| deep content | 3–14 days | depends on crawl budget and demand |
Your targets depend on volatility + crawl demand.
Your moat is not “updated weekly.” It’s: “PHL is low and provable.”
Signal Observatory (Cloudflare) = Freshness Without Theater
If you don’t measure propagation, freshness is performative.
What the Observatory measures
- who re-fetches updated surfaces after UpdateEvents (as behavior buckets)
- how fast (PHL)
- whether
/llms.txtand/changes.jsonare actually used - why universal “agent detection” fails (Cloudflare has documented examples of stealth/undeclared crawling behavior in the ecosystem) 12
Behavior buckets (realistic, non-magical)
You can’t identify every agent. You can:
- Verified crawlers (where verification exists)
- Google documents crawler verification via reverse DNS + forward DNS checks 2 .
- Declared bots (documented UAs + robots controls)
- OpenAI documents crawlers and robots.txt controls (e.g., GPTBot, OAI-SearchBot) 6 .
- Some agent requests can be authenticated (signed) for allowlisting, but we still treat observability as behavior-first 14 .
- Anthropic documents Claude crawlers/modes (training vs user-initiated browsing) 8 .
- Perplexity provides bot guidance including verification patterns 9 .
- Undeclared/stealth automation
- treat as “unknown automation,” measured by behavior, not identity.
Baseline Observatory works with Cloudflare Analytics + origin logs (or just the edge metrics you already have): UA + request patterns + cache status + refetch timing.
Optional upgrade: Logpush (HTTP requests dataset) for richer, queryable event streams, including cache status and (where available/plan-enabled) bot fields. If Bot Management is enabled, bot score / verified-bot flags can improve classification and routing (e.g., bot score variables) 11 10 .
Observatory output (what we watch)
UpdateEvent → affected URLs → time-to-first refetch (verified crawlers) → refetch rate over 24/72h → PHL
The simplest “observability loop”
- Emit an UpdateEvent (ChangeFeed)
- Ensure headers/sitemap/IndexNow are correct
- Watch re-fetch patterns for:
/entity.json/changes.json/llms.txt- the affected canonical pages
- Compute PHL per bucket
- Fix what’s slow (signals, caching, discoverability, blockers)
Optional measurement hook: OpenAI documents that publishers can track referral traffic from ChatGPT using UTMs 7 . Use it as a reality check, not as the primary freshness metric.
Close: Freshness Is a Distribution Problem
Phase 2 made truth structured.
Phase 3 makes truth alive.
Competitors can mimic your prose.
They can’t mimic a system that ships verifiable deltas + disciplined signals + measured propagation.
If your update can’t be observed propagating, it didn’t happen.