Data flow profiler
Problem
Right now we only see parts of the pipeline. The daemon has eztrc, the desktop app has a console flag. But nothing follows one piece of data from start to finish: from the moment a blob arrives in the daemon, through the channel between the daemon and the desktop, into the window, and finally onto the screen.
So when a new comment takes too long to show up, or it never shows up, we don't know where it died. Was it the daemon? The poll in the desktop? The message to the window? The refetch? The render? We are guessing every time.
I want a small tool that just answers this question by looking at one page.
Solution
The idea is simple. Both sides drop little timestamps along the way, all tagged with the same key. The daemon collects them in memory and shows a page where you can see the full timeline per piece of data.
One shared key
We use hm://<account>/<path>?v=<version_cid>. The daemon already has this when it indexes a blob, the desktop already has this when it asks for a doc. Same string on both sides. No new IDs.
How the timestamps get to the daemon
One new gRPC method, Telemetry.RecordCheckpoints. The desktop side batches stamps and sends them every second. The daemon keeps them in a ring buffer (in memory only, like trcstats). No database, no disk.
On by default, free to turn off
Profiler runs by default. To turn it off, set SEED_PROFILER=0. The hot path is one time.Now() plus one append to a small buffer, so the cost is basically nothing. If pprof shows otherwise we fix it before merging.
The checkpoints
This is the list of stamps and where they fire. For each one: the side that does it, and the area of code.
backend.blob_indexed— daemon (Go). Blob indexer / activity feed indexer, whereobserve_timeis set.backend.feed_emitted— daemon (Go). Activity APIListEvents, for eachNewBlobEvent.backend.grpc_request_received— daemon (Go). gRPC server interceptor (preferred) or entry ofGetDocument/GetEntity/GetAccount/ capability lookups.backend.grpc_response_sent— daemon (Go). Same place, on the way out.main.feed_event_received— desktop, Node side.app-sync.tsfetchNewEvents, per event from the poll.main.invalidation_broadcast— desktop, Node side.app-sync.tsprocessEvents, right before eachappInvalidateQueries.renderer.invalidation_received— desktop, window side.app-invalidation.ts, on each window.renderer.refetch_start— desktop, window side. React Query refetch foruseEntity/useAccount/useCapabilities.renderer.grpc_call_start— desktop, window side. Connect transport interceptor (grpc-client.ts).renderer.grpc_call_end— desktop, window side. Same interceptor, on response.renderer.cache_updated— desktop, window side. HookonSuccessfor entity / account / capability queries.renderer.component_rendered— desktop, window side.useEffectwhendatafirst becomes defined, in the doc / profile / capability render components.
The "new blob arrived" path uses every stage from top to bottom. The "user opened a doc" path skips the feed stages and starts at renderer.grpc_call_start.
Refreshes and the same data more than once
People refresh. The app opens the same doc again later. The transport retries by itself. The same hm:// URL shows up again and again. If we just appended new stamps to one timeline per URL, the first attempt (the one that died) would disappear under the next one. And the dead one is exactly the interesting one.
So the daemon does not keep "one timeline per URL". It keeps one timeline per attempt, numbered gen 1, gen 2, and so on. A new attempt opens a new generation. Old ones get closed and stay around so you can still look at them.
A new generation opens when a "starting" stamp comes in:
backend.blob_indexed(daemon got a new blob)backend.grpc_request_received(daemon got a fresh request)renderer.grpc_call_start(window started a new call)
Everything else just continues the current generation.
A generation closes in one of three ways:
It reached
renderer.component_rendered→ complete (green).A newer attempt for the same resource started before this one finished → coalesced (blue). Normal during bursts.
30 seconds passed with no new stamp → abandoned (red if late stage, grey if early). This is the bug bucket.
You can refresh the same doc five times and see five generations side by side. The dead ones stay visible. Multiple rows for the same URL is already a signal — something is hitting it many times.
The /debug/journeys page
Loopback only. Two views at the top:
All — every trace, every generation. Columns: key, gen, last stage, status, total time, per-stage deltas. Sortable. Gaps over 200ms get highlighted in red.
Broken paths — only abandoned traces, grouped by
last_stage. So you see at a glance "12 dead atrenderer.refetch_start, 3 atmain.invalidation_broadcast". This is the page you open when something is not working. It points straight to the part of the pipeline that is losing data.
A small banner at the top of "Broken paths" counts the most common death sites in the last 5 minutes. A quick look tells you if something is flaky right now.
Scope
One developer, phased:
Proto + telemetry server + ring buffer +
/debug/journeyspage (with both views).Backend stamp sites (
blob_indexed,feed_emitted,grpc_request_received/_sent).Desktop side: profiler module + Connect interceptor + Node-side stamps + window-side stamps.
End-to-end checks: happy path, abandoned, coalesced, refresh / generation correctness, cost check on pprof.
We start with documents, profiles and capabilities. They share the same flow shape (event → invalidation → refetch → render) so they go together. Comments come next, in a separate step.
Rabbit Holes
Hot-path cost going up by accident. One innocent map allocation in a stamp and the budget is gone. We need a pprof check before merging.
Generation rules under retries. The "which stamp opens a new generation and which one just continues" must be exact, or timelines will hide each other.
Carrying the key through the invalidation message between the desktop's two sides. The current message doesn't have a place for the URL. We need to add one. Small change but it touches a shared shape.
The Connect interceptor needs a small map of "this method puts the account / path / version in these request fields". Boring, easy to get wrong, but small.
renderer.component_renderedmust fire when the data is actually painted, not when the hook returned. Needs auseEffectondatafirst becoming defined, not on hook call.
No Gos
No disk or database storage. Ring buffer in memory only. Same as
trcstats.No comments now. They come later.
No public exposure.
/debug/journeysstays on loopback.No free-form attributes on the hot path. Stamps are
(key, stage, ts)only.No alerts or SLO dashboards. Just the page and the broken paths view. If we need alerts later, that's a different project.
Do you like what you are reading?. Subscribe to receive updates.
Unsubscribe anytime