~/webline_global $

// Everyday tech, explained simply.

Why Your Async PostgreSQL Queries Freeze Under 200 Simultaneous Writes

· 8 min read
Why Your Async PostgreSQL Queries Freeze Under 200 Simultaneous Writes

You’ve got your Node.js backend humming along with async PostgreSQL queries, and everything looks solid in staging. Then you push to production, hit 200 concurrent write operations, and suddenly your response times spike from 20 milliseconds to 20 seconds. The database isn’t crashing, and your CPU usage is barely breaking a sweat, yet every INSERT and UPDATE call just sits there, waiting. You’ve just run into the silent killer of async database performance: connection pool saturation, compounded by write amplification and transaction isolation lock contention.

The conventional wisdom says async drivers and connection pooling should handle hundreds of concurrent queries without breaking a sweat. And they can — for reads. But writes, especially under PostgreSQL’s Multi-Version Concurrency Control (MVCC) model, introduce a fundamentally different bottleneck. The problem isn’t that your code is slow. It’s that your database is serializing writes in ways your async abstraction never warned you about.

The Connection Pool Lie

Most async PostgreSQL libraries, like node-postgres with pg-pool or Python’s asyncpg, default to a connection pool size of 10 to 20 connections. This makes sense for typical web applications: 20 connections can handle hundreds of concurrent read requests because each connection is reused quickly. But writes don’t behave that way.

The Pool as a Bottleneck

When you issue an INSERT or UPDATE, that connection is locked for the duration of the transaction, not just the query. If you’re using auto-commit mode — which many developers assume is safe — each write still holds the connection open until the database acknowledges the commit. Under load, those 20 connections fill up fast. The 21st write request doesn’t get a connection; it waits in the pool’s queue. This is where your “freeze” begins.

I once consulted for a small iGaming platform that processed live bet placements. Their Node.js service handled 150 writes per second during peak hours, and response times degraded from 15ms to 8 seconds within minutes of a tournament starting. They had max: 20 in their pool config and assumed that was enough because their database CPU was at 30%. The issue wasn’t compute — it was that every write was holding a connection for 120ms on average, and the math simply didn’t work. 150 writes per second × 120ms hold time = 18 connections in constant use. Add any spike or slow query, and you hit the wall.

Right-Sizing the Pool

The naive fix is to increase max to 100 or 200 connections. Don’t do that. PostgreSQL has a hard limit on connections (default is 100), and each connection consumes memory — roughly 5–10 MB per idle connection. Spiking to 200 connections can crash your server before it helps throughput. A better approach is to calculate your ideal pool size based on your database’s CPU core count. The formula is simple: pool_size = (2 * core_count) + effective_concurrent_writes. For a 4-core machine handling 200 concurrent writes, start with max: 12 and monitor queue wait times.

Write Amplification and MVCC Overhead

PostgreSQL’s MVCC doesn’t update rows in place. Every UPDATE creates a new tuple (row version), and the old one is marked as dead until VACUUM reclaims it. This write amplification means your single UPDATE statement is actually doing 3x to 5x more disk I/O than the raw data size suggests.

The Freeze Comes from I/O, Not CPU

Under 200 simultaneous writes, the database’s shared buffer pool fills with dirty pages faster than the background writer can flush them to disk. Checkpoints become more frequent and heavier. When a checkpoint triggers, all write operations pause briefly while the database syncs the WAL (Write-Ahead Log) and flushes dirty buffers. This “checkpoint freeze” is well-documented but rarely accounted for in dev environments.

If your database is on standard spinning disks or even low-provisioned cloud SSDs, these checkpoint pauses can last hundreds of milliseconds. During that time, every async query in your pool — even reads — blocks waiting for the lock on the buffer pool. Your async code doesn’t yield; it just waits longer for the connection to become available.

Real-World Example: The 3x Index Bloat

I helped a team migrate a real-time leaderboard system from MongoDB to PostgreSQL. Their writes were simple UPDATE high_scores SET score = $1 WHERE user_id = $2. With 200 concurrent updates, the write throughput collapsed after 30 seconds. The culprit? A B-tree index on user_id that was growing 3x faster than the table itself. Each update inserted a new index entry and marked the old one dead, but autovacuum couldn’t keep up. The index became bloated, and each write took longer because the index scan for the WHERE clause had to skip over dead entries.

The fix was to use REINDEX CONCURRENTLY weekly and tune autovacuum_vacuum_scale_factor from 0.2 to 0.05. That alone cut write latency by 60%.

Transaction Isolation and Lock Contention

Your application’s transaction isolation level is the silent partner in your performance problems. The default READ COMMITTED mode in PostgreSQL is safe for most workloads, but under 200 concurrent writes to the same table — or worse, the same rows — you trigger row-level lock contention.

Row Locks Serialize Writes

When two transactions try to UPDATE the same row simultaneously, the second one waits for the first to commit or roll back. This is not a queue in your application code; it’s a lock held inside the database kernel. Your async connection is blocked, not waiting on the pool, but waiting on the database itself. And because the lock wait time accumulates, 200 concurrent writes to a hot row can cascade into a 30-second stall.

I’ve seen this happen in a payment processing system where a single “user_balance” row was updated on every deposit and withdrawal. Under 50 concurrent requests, it worked fine. Under 200, the lock wait times exploded because every transaction queued on that one row. The solution was to use a ledger table with INSERT-only operations and compute the balance with a SUM query, eliminating the hot row entirely.

Deadlocks Under High Concurrency

When transactions update rows in different orders — Transaction A locks row 1 then waits for row 2, while Transaction B locks row 2 then waits for row 1 — PostgreSQL detects the deadlock and kills one transaction. Your async code gets a deadlock detected error, and the connection is released. But the retry logic in your application can make things worse if it retries immediately, re-entering the same deadlock pattern.

The fix is to enforce a consistent update order across all transactions. In a betting system, always update the “user_balance” before the “bet_slip” row. This eliminates the cyclic dependency that causes deadlocks.

The Async Misconception: Non-Blocking Doesn’t Mean Non-Waiting

There’s a dangerous assumption that because your database driver is async, your queries will never block the event loop. That’s true for the network I/O layer — the database call won’t block Node.js or Python’s event loop. But it doesn’t mean your request will return quickly. The async callback or await simply waits longer.

The Event Loop is Fine, Your Users Are Not

Your server can still accept new connections and serve static assets while 200 write queries are pending. But each pending write holds a connection from the pool, and the user waiting for that INSERT to complete sees a timeout. The application appears “frozen” from the client’s perspective, even though the server is technically responsive.

I’ve debugged cases where developers added more worker processes or increased UV_THREADPOOL_SIZE thinking they could parallelize their way out of the problem. It didn’t help because the bottleneck was inside PostgreSQL, not in the application’s concurrency model. More workers just meant more connections competing for the same database locks.

The Real Bottleneck is Database Throughput, Not Application Throughput

You need to measure your database’s maximum write throughput in transactions per second (TPS), not just query latency. A single PostgreSQL instance on decent hardware typically handles 5,000 to 15,000 TPS for simple writes. But that’s under ideal conditions with no lock contention. At 200 concurrent writes, you’re not limited by TPS; you’re limited by the time each transaction holds its locks and connections.

Use pg_stat_activity to see how many queries are in active state versus idle in transaction or waiting for lock. If you see a pile-up of waiting queries, you’ve got lock contention. If you see many active queries taking longer than 100ms, you’ve got I/O or MVCC problems.

Practical Mitigations That Actually Work

Don’t throw hardware at this problem until you’ve applied the software fixes first. These three changes have resolved the “200 write freeze” for every team I’ve worked with.

Use Batch Inserts and Upserts

Instead of 200 individual INSERT statements, batch them into groups of 20 or 50 using INSERT INTO ... VALUES (...), (...), (...) or INSERT ... ON CONFLICT DO UPDATE for upserts. This reduces the number of transactions and the lock overhead. PostgreSQL handles batch inserts far more efficiently because the WAL can be written in larger sequential chunks.

In your application, collect writes in a buffer and flush them every 50ms or every 50 records, whichever comes first. This adds 50ms of latency but increases throughput by 5x to 10x under load.

Tune Autovacuum and Checkpoints

Set autovacuum_vacuum_scale_factor = 0.01 for tables with high write volume. This triggers vacuum more frequently but keeps each run short. For checkpoints, increase checkpoint_completion_target to 0.9 and set max_wal_size to at least 4GB. This spreads the checkpoint write load over a longer period, reducing the freeze duration.

Implement Write Sharding at the Application Level

If you control the data model, shard writes across multiple tables or databases by hashing the primary key. For example, instead of one transactions table, use 16 tables named transactions_0 through transactions_15. Your application routes each write to a specific shard based on user_id % 16. This reduces lock contention because writes are distributed across 16 independent tables, each with its own index and MVCC chain.

This isn’t as complex as it sounds. A simple routing function in your database layer can handle it transparently. I’ve seen this approach handle 3,000 writes per second on a single PostgreSQL instance that previously choked at 300.

The Forward-Looking Note

The next time your async PostgreSQL writes freeze under load, don’t blame the driver or the event loop. Look at your connection pool arithmetic, your transaction isolation choices, and your write amplification patterns. The database is not a black box — it’s a state machine with well-defined bottlenecks. Learning to read pg_locks, pg_stat_activity, and checkpoint logs will save you more time than any ORM configuration or cloud upgrade. The tools are already there; most developers just never look at them until the freeze hits.