Why does contention of writers affect the daemon’s ability to read?

Core question

In SQLite WAL mode, readers and one writer should usually coexist. So why does writer contention appear to hurt the daemon’s ability to read?

What we know

The daemon makes many concurrent write attempts to a database that allows only one writer at a time.

Observed writer contention:

many callers enter BEGIN IMMEDIATE

only one can hold the SQLite writer lock

others wait inside the SQLite busy handler

after ~10s they return SQLITE_BUSY

Important distinction

Writer contention does not necessarily block reads directly at the SQLite lock level.

In WAL mode:

Many readers + one writer can coexist
Only one writer can write

But writer contention can still indirectly hurt reads.

Main mechanism: shared connection pool starvation

Reads and writes share the same sqlitex.Pool.

When many writer goroutines are stuck waiting in BEGIN IMMEDIATE, each one may hold a pooled SQLite connection.

Then read paths can block before SQLite even executes the read:

conn, release, err := db.Conn(ctx)

So the read is not blocked by a SQLite read lock. It is blocked because all/most DB connections are occupied by writers waiting for the write lock.

This is probably the main explanation.

Other mechanisms

CPU / I/O pressure

Heavy writers like indexing/sync can consume:

disk bandwidth

page cache

CPU

SQLite internal work

Reads are logically allowed, but they still compete for machine resources.

WAL growth / checkpoint pressure

In WAL mode, writes append to db.sqlite-wal.

If checkpointing falls behind, the WAL grows. A large WAL can make reads and checkpoints more expensive.

We observed an old daemon with a very large WAL file (~679 MB).

Mixed request pipelines

Many user-visible operations are not pure reads.

Example pipeline:

read peer list
connect to peers
update peers table
fetch blobs
PutMany/index blobs
read document/events

So a user may experience “reading is slow”, but the request may be waiting on best-effort writes somewhere earlier in the pipeline.

“Read-looking” code may still use savepoints

Many read paths use WithSave. Pure read savepoints are okay, but they still hold a DB connection. If any statement mutates temp/real tables, it can enter writer accounting.

Current likely offenders / signals

From debug snapshots:

hmnet.(*Node).connect has high begin_busy count and ~10s waits.

Its own hold time is tiny.

So it is mostly a victim/amplifier.

blob.(*Index).PutMany-range1 can hold writer lock for hundreds of ms.

sync/indexing paths can repeatedly take the writer lock.

domain cache writes are also small but can add pressure.

feed/list/event queries are mostly read-side performance concerns, not necessarily direct writer-lock holders.

Key conclusion

The issue is likely not:

writer lock directly blocks all reads

It is more likely:

writer storm consumes DB pool + I/O + scheduler capacity
so reads cannot get resources promptly

Proposed direction

We need both:

Reduce write work

optimize heavy SQL

shorten transactions

batch writes carefully

Control write admission

avoid many goroutines racing BEGIN IMMEDIATE

use Go-level queue/gate for writes

batch/coalesce best-effort writes

prioritize user-visible writes

Separate/guard read capacity

consider read vs write pool behavior

ensure writers waiting for lock do not consume all connections needed by reads

Practical next fix

Move hmnet.(*Node).connect peer-row update into peerWriter.

Reason:

it is best-effort bookkeeping

currently it can wait 10s on SQLite busy

it creates many busy events

it should not block successful network connect or read/sync flows

Then consider broader writer admission control for hot paths like PutMany, domain updates, and sync writes.

Do you like what you are reading? Subscribe to receive updates.

Unsubscribe anytime