~/webline_global $

// Everyday tech, explained simply.

Why Your Node.js Backend Slows Down After 10 Concurrent Users

· 9 min read
Why Your Node.js Backend Slows Down After 10 Concurrent Users

You just shipped a Node.js API that handles everything flawlessly in local testing. You deploy it, maybe three friends hit it, response times are snappy—under 50ms. Then someone posts a link on Reddit, traffic jumps to 10 concurrent users, and suddenly your "lightning-fast" backend starts serving responses in 4 seconds. What happened? Did your VPS suddenly get throttled? Did your database crash? The answer is almost certainly simpler than you think—and it has nothing to do with your hardware.

The Single-Threaded Lie We Tell Ourselves

Node.js is famous for being “non-blocking” and “event-driven,” but that comes with a catch most tutorials gloss over: your application code runs on a single thread. That single thread is your JavaScript event loop, and it’s the most common bottleneck I see in production backends built by indie devs.

When you test locally with one user, the event loop processes one request, hands off the I/O (database query, file read, external API call) to the system kernel, and immediately picks up the next task. It feels like magic because there’s no queue. But when 10 users hit your API simultaneously, those requests form a queue on that single thread. If any one of those requests contains a CPU-heavy operation—parsing a large JSON payload, sorting an array of 10,000 objects, or running a cryptographic function—the entire event loop freezes.

I once watched a colleague’s authentication middleware block all other requests because it was synchronously hashing passwords with crypto.createHash inside a for loop that processed 500 user records. On the third concurrent login, the whole server stalled for 2.3 seconds. The fix was trivial: use async crypto or offload to a worker thread. But the damage had already been done—users saw a spinning wheel and bounced.

The Event Loop Starvation Problem

Here’s the mechanics of what happens. Your Node.js process has a single call stack. When a request comes in, it gets placed on the stack. If that request triggers a synchronous operation—even something as innocent as JSON.parse(body) on a 2MB payload—the stack won’t pop until parsing finishes. Meanwhile, the other nine requests sit in the event loop’s callback queue, waiting.

This isn’t a bug. It’s by design. But the design assumes you’ll never block the event loop with synchronous CPU work. Most indie devs accidentally block it within the first hundred lines of their route handlers.

How to Diagnose It

You don’t need fancy APM tools to spot event loop blocking. Run your server locally, open Chrome DevTools, connect the Node.js debugger, and record a performance profile while simulating 10 concurrent users with autocannon or wrk. Look for long “Task” bars that exceed 50ms. If you see any task taking 100ms or more, you’ve found your bottleneck. The fix is almost always moving that work to a Worker Thread, child_process.fork, or restructuring the code to be truly asynchronous.

The Database Connection Pool Trap

Even if your event loop is perfectly non-blocking, your database can still sink your performance. The classic mistake is creating a new database connection for every incoming HTTP request. I see this pattern constantly in tutorials and boilerplates: const client = new Client() inside a route handler, followed by client.connect() and client.end().

With one user, this works fine. The connection opens, the query runs in 5ms, and the connection closes. With 10 concurrent users, you now have 10 simultaneous connection attempts hitting PostgreSQL or MySQL. Most databases have a default max_connections setting between 100 and 200, so you won’t hit a hard limit immediately. But each connection consumes RAM—typically 5-10MB per connection. Ten connections is 50-100MB of overhead just for the sockets, before any query processing.

The Real Killer: Connection Thundering Herd

The worse scenario is when your database server is on a different machine or a managed service like RDS. Each new connection requires a TCP handshake, SSL negotiation (if enabled), and authentication. That’s 3-5 round trips before your query even starts. With 10 concurrent users, you’re doing 10 handshakes simultaneously. The database server gets a “SYN flood” from your app, starts dropping packets, and your query latency jumps from 5ms to 200ms just from connection overhead.

I debugged a production incident where a Node.js API on a $5 DigitalOcean droplet was making 15 connections per second to a managed Postgres instance. The database logs showed “sorry, too many clients already.” The fix was a connection pool of 5 persistent connections managed by pg-pool. Response times dropped from 800ms to 40ms instantly.

Setting Up a Proper Pool

If you’re using PostgreSQL with Node.js, install pg and create a pool at application startup:

const { Pool } = require('pg');
const pool = new Pool({
  max: 10, // maximum number of clients in the pool
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

Then in your route handlers, use pool.query() instead of creating a new client. The pool reuses connections and queues queries if all connections are busy. For MySQL, the mysql2 package offers a similar createPool interface. For MongoDB, the native driver pools by default—just make sure you’re not calling mongoose.connect inside every request.

The Synchronous Middleware Cascade

Your authentication middleware, rate limiter, request validator, and CORS handler are probably all synchronous functions that run sequentially for every request. With one user, the overhead is negligible. With 10 concurrent users, you’ve created a cascade of synchronous work that multiplies your response time.

Consider a typical Express middleware stack for a protected endpoint:

  1. cors() – lightweight, usually fine.
  2. helmet() – sets security headers, still fine.
  3. express.json() – parses the request body.
  4. rateLimit() – checks Redis or in-memory store.
  5. authMiddleware() – decodes a JWT and fetches user from DB.
  6. validateBody() – runs Joi or Zod schema validation.
  7. Route handler – does the actual work.

Steps 3, 5, and 6 are the dangerous ones. express.json() uses a synchronous JSON.parse under the hood for bodies under 100KB. If someone sends a 500KB JSON payload, that parse blocks the event loop. JWT decoding with jsonwebtoken.verify() is synchronous by default—it runs the HMAC or RSA verification on the main thread. If you have 10 concurrent requests all hitting verify() at once, each one takes 2-5ms, but they execute sequentially, adding 20-50ms of overhead before any request reaches its handler.

The JWT Verification Bottleneck

I once profiled an API where JWT verification accounted for 35% of total response time under load. The fix was to use the asynchronous version of jwt.verify() with a callback, or better yet, cache the decoded token for the token’s TTL minus a few seconds. If you’re using a public key from a JWKS endpoint, fetch and cache that key once per minute instead of downloading it on every request.

Parallelizing Middleware

Some middleware can run in parallel. If your rate limiter checks Redis and your auth middleware checks a database, those two operations don’t depend on each other. You can run them concurrently using Promise.all inside a custom middleware wrapper:

async function parallelMiddleware(req, res, next) {
  await Promise.all([
    rateLimiter(req),
    authMiddleware(req),
  ]);
  next();
}

This won’t speed up a single request dramatically, but under concurrent load, it reduces the total time the event loop spends on middleware overhead. Every millisecond you save per request compounds when 10 users hit the server simultaneously.

The Hidden Garbage Collector Tax

JavaScript’s garbage collector (GC) is automatic, but it’s not free. Under low load, GC runs infrequently and takes a few milliseconds. Under 10 concurrent users, your server creates more objects per second—request objects, response objects, buffers, parsed JSON objects, database result rows. All that memory allocation forces V8’s GC to run more often and for longer durations.

The GC runs on the main thread. While it’s running, no JavaScript code executes. If a GC cycle takes 100ms, every request that arrives during that window gets delayed. With 10 concurrent users, the probability of a request arriving during a GC pause increases linearly. This creates a feedback loop: more users → more objects → more GC pauses → slower responses → users retry → even more objects.

How to Spot GC Pauses

Run your Node.js process with the --trace-gc flag and watch the logs. You’ll see lines like:

[3613:0x102800000]   1337 ms: Scavenge 4.0 (6.0) -> 3.2 (7.0) MB, 1.8 / 0.0 ms
[3613:0x102800000]   2891 ms: Mark-sweep 12.5 (15.2) -> 11.0 (17.5) MB, 12.3 / 0.0 ms

If you see Mark-sweep pauses exceeding 50ms, you have a GC problem. The fix is usually to reduce object allocation per request. Reuse objects where possible, avoid creating large intermediate arrays in your route handlers, and consider using --max-old-space-size to give V8 more room before triggering a full GC.

Pooling Objects and Buffers

For high-throughput endpoints, pre-allocate buffers and objects outside the request handler. If you’re building a real-time chat server that processes 1000 messages per second, don’t create a new message object for each incoming WebSocket frame. Use a simple object pool:

const messagePool = [];
function getMessage() {
  return messagePool.pop() || { userId: null, text: null, timestamp: null };
}
function releaseMessage(msg) {
  msg.userId = null;
  msg.text = null;
  msg.timestamp = null;
  messagePool.push(msg);
}

This pattern is common in game servers and iGaming platforms where every millisecond matters. For a typical CRUD API, you probably don’t need object pooling, but being aware of allocation patterns helps you avoid GC pressure under moderate load.

What 10 Concurrent Users Actually Means

Let me demystify that number. Ten concurrent users doesn’t mean ten people are actively clicking buttons at the exact same microsecond. In web terms, it means ten open TCP connections that are all in various stages of request processing. One user might be waiting for a database query, another is sending a file upload, a third is getting rate-limited.

Each connection consumes a file descriptor. Linux has a default limit of 1024 file descriptors per process. Ten connections is nothing—you won’t hit that limit. But each connection also consumes a small amount of RAM for the socket buffer, typically 16KB to 64KB. Ten connections means 640KB of buffer space. Still trivial.

The real issue is that ten concurrent requests amplify any inefficiency in your code by an order of magnitude. A 10ms synchronous operation that’s invisible during local testing becomes a 100ms queue delay under ten concurrent users. A badly configured database pool that opens a new connection per request becomes ten simultaneous handshakes that time out.

The Real-World Anecdote

I helped a friend debug his side project—a small tournament bracket generator for a gaming community. He had 50 active users, but peak times saw maybe 8-10 concurrent requests. His Node.js server ran on a $10/month VPS. Response times were 2-3 seconds during peak. The database was SQLite, running on the same machine, with no connection pooling. Each request opened a new SQLite connection, which is a file lock operation. With 10 concurrent requests, 9 of them were waiting for the file lock to release. Switching to better-sqlite3 with a WAL mode and a connection pool of 2 dropped response times to 80ms. He didn’t need a bigger server. He needed one configuration change.

The Practical Takeaway

Stop thinking about scaling horizontally until you’ve exhausted vertical optimization. Your Node.js backend doesn’t slow down because it can’t handle 10 users—it slows down because you’re blocking the event loop, misconfiguring your database connections, or ignoring garbage collection pressure. Profile your middleware stack for synchronous operations. Set up a connection pool with max: 5 and watch your response times drop. Run a GC trace to see if V8 is fighting for memory.

The next time you hit 10 concurrent users and see latency spike, don’t reach for Kubernetes or a load balancer. Reach for node --inspect and a profiler. Nine times out of ten, the bottleneck is in your own code, not your infrastructure. Fix that first, and you might find your “tiny” server can handle 100 concurrent users without breaking a sweat.