Table of Contents
Your JavaScript Code is a Liability: Lessons from the On-Call Trenches
It was 3:14 AM on a Tuesday. I was three years into my career, working for a high-frequency trading platform. We had just migrated our order-matching notification service to Node.js because the “dev-ex” was supposedly superior. I pushed a change that added a simple event listener to a global emitter to track user heartbeats. It looked innocent. It passed CI. It passed staging with two concurrent users. In production, we had 40,000 concurrent WebSocket connections. Within twenty minutes, the RSS (Resident Set Size) of the Node process climbed from 200MB to 4GB. The OOM-killer stepped in, nuked the process, and the load balancer started 502-ing every request. I didn’t just break the site; I cost the company $12,000 per minute in missed trades because I forgot that EventEmitter listeners don’t magically disappear when the local scope ends.
That was my “welcome to the real world” moment. Since then, I’ve spent a decade cleaning up “clean code” that fails under load. Most “javascript best” practices you read on generic tech blogs are written by people who have never had to debug a memory leak in a containerized environment while their Slack is exploding with P0 alerts. They tell you to use const instead of let. I’m here to tell you how to stop your code from becoming a bottleneck when the CPU hits 90%.
The Async/Await Illusion and the Event Loop Bottleneck
Everyone loves async/await. It makes asynchronous code look like synchronous code. That is exactly the problem. It lures developers into a false sense of security, leading them to write sequential code where they should be writing parallel code. I see this “waterfall” pattern in almost every PR I review.
// The "I'm making the user wait for no reason" pattern
async function getUserDashboard(userId) {
const user = await db.users.find(userId); // 50ms
const orders = await api.stripe.com/v1/orders(user.stripeId); // 200ms
const preferences = await db.settings.find(userId); // 30ms
return { user, orders, preferences };
}
This function takes 280ms. You are wasting 230ms of that time doing absolutely nothing. In a high-traffic environment, those idle milliseconds compound. You’re holding onto memory for the user object while waiting for Stripe’s API to respond. Instead, you should be leveraging Promise.allSettled. I say allSettled because Promise.all is a “fail-fast” mechanism that is often too aggressive for complex UIs.
- Stop the Waterfall: Use
Promise.allorPromise.allSettledfor independent I/O operations. - Avoid
forEachwith Async:Array.prototype.forEachdoes not wait for promises. If you use it, you’re firing off a bunch of floating promises that your error handler won’t catch. Usefor...ofif you need sequential execution, orPromise.allfor parallel.
Pro-tip: If you’re running Node.js 18+, use the built-in
AbortControllerto time out fetch requests. Don’t let a hanging third-party API call tie up your event loop indefinitely.
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 5000);
try {
const response = await fetch('https://api.stripe.com/v1/charges', {
signal: controller.signal
});
// ...
} catch (err) {
if (err.name === 'AbortError') {
console.error('Stripe API timed out. Don't let the user hang.');
}
} finally {
clearTimeout(timeoutId);
}
Memory Management: The Silent Killer
JavaScript is garbage-collected, which means most developers think they don’t have to worry about memory. This is a dangerous lie. In a long-running Node.js process, a small leak is a ticking time bomb. The most common culprit? Closures and global caches.
I once saw a team implement a “simple” in-memory cache using a plain JavaScript object. They forgot to implement a TTL (Time To Live) or a maximum size. Over three days, the object grew to 1.2GB. When the V8 engine tried to perform a Full GC (Garbage Collection), it paused the entire process for 1.5 seconds. In a real-time system, a 1.5s pause is an eternity. It triggers health check failures, which causes Kubernetes to restart the pod, which leads to a “crash loop backoff” because the new pod immediately tries to hydrate the same massive cache.
If you need a cache, use lru-cache. If you need to associate data with an object without preventing that object from being garbage collected, use WeakMap.
// This is how you leak memory
const userCache = new Map();
function processUser(user) {
userCache.set(user.id, { ...user, processedAt: Date.now() });
// If users are never removed, this Map grows forever.
}
// This is how you stay sane
const userMetadata = new WeakMap();
function processUser(user) {
userMetadata.set(user, { processedAt: Date.now() });
// When 'user' is no longer referenced elsewhere, this entry is eligible for GC.
}
V8’s heap is split into “New Space” and “Old Space.” Most objects die young in the New Space. But if your object survives long enough, it gets promoted to the Old Space. GC in the Old Space is expensive. Stop creating unnecessary objects in hot loops. If you’re processing 10,000 rows from a database, don’t map them into new objects three times just because you like the “functional” syntax.
Error Handling Beyond Try/Catch
Most JavaScript error handling is garbage. Developers either swallow errors with an empty catch block or throw generic strings. If I see throw "Error occurred" in a codebase, I immediately lose faith in the author. Always throw an Error object, and starting with Node 16, use the cause property to chain errors.
try {
await db.connect('localhost:5432');
} catch (err) {
throw new Error('Failed to initialize data layer', { cause: err });
}
Why does this matter? Because when you’re looking at a stack trace in Datadog or Sentry at 4 AM, you need to know the root cause. A stack trace that just says “Failed to initialize data layer” is useless. You need to see the ECONNREFUSED from the underlying driver.
Also, stop using process.on('uncaughtException') as a catch-all to keep the process alive. If an uncaught exception occurs, your process is in an undefined state. It might have a half-written file descriptor or a locked database connection. The only safe thing to do is log the error and crash. Let your orchestrator (K8s, Nomad, PM2) restart the process.
- Use Structured Logging: Stop using
console.log. It’s synchronous when writing to files or pipes in certain conditions, and it lacks metadata. Usepino. It’s the fastest logger in the ecosystem and outputs JSON by default. - Operational vs. Programmer Errors: Distinguish between things that will happen (network timeout) and things that shouldn’t happen (null pointer dereference). Handle the former; crash on the latter.
The Dependency Black Hole
The “javascript best” approach to dependencies is usually “less is more.” Every time you npm install, you are taking a mortgage out on your project’s future maintenance. I’ve seen node_modules folders that were 2GB for a simple CRUD app. This isn’t just about disk space; it’s about the attack surface and the “dependency hell” that occurs when two packages require different versions of the same peer dependency.
Before adding a package, ask: “Can I write this in 10 lines of native JS?” You don’t need left-pad. You don’t need is-13. You probably don’t even need lodash anymore, given that modern engines have Array.prototype.flat, Object.fromEntries, and optional chaining.
Note to self: Always use
npm ciin your CI/CD pipelines instead ofnpm install. It ensures you get the exact versions in your lockfile and fails if the lockfile andpackage.jsonare out of sync. It’s faster and more deterministic.
And for the love of all that is holy, audit your dependencies. But don’t just run npm audit fix blindly. Most of those “vulnerabilities” are in dev-dependencies or are “ReDoS” (Regular Expression Denial of Service) risks that don’t actually affect your production runtime. Be surgical.
TypeScript: The Necessary Evil
I used to hate TypeScript. I thought it was just Java-lite for people who were afraid of dynamic languages. I was wrong. In a large-scale system, TypeScript is the only thing that keeps the “refactor-induced-outage” at bay. But most people use it wrong.
If your codebase is littered with any, you don’t have a typed system; you have a complicated way of writing bad JavaScript. Use unknown for data coming from external APIs and validate it at the boundary using a library like zod.
import { z } from 'zod';
const UserSchema = z.object({
id: z.string().uuid(),
email: z.string().email(),
retryCount: z.number().int().min(0),
});
async function fetchUser(id: string) {
const response = await fetch(`http://localhost:3000/users/${id}`);
const data = await response.json();
// Validate at the edge!
const result = UserSchema.safeParse(data);
if (!result.success) {
throw new Error('Invalid user data from internal API', { cause: result.error });
}
return result.data; // result.data is now fully typed
}
This “Parse, don’t validate” pattern ensures that once data enters your core logic, it is guaranteed to be correct. You stop checking if (user && user.id) everywhere. It cleans up the code and prevents those “Cannot read property ‘id’ of undefined” errors that haunt your logs.
The Event Loop: Don’t Block the Main Thread
JavaScript is single-threaded. This is its greatest strength and its greatest weakness. You don’t have to worry about mutexes or race conditions on variables, but you can easily freeze your entire server with a single CPU-intensive task.
I once saw a service that generated large CSV reports. The developer used JSON.parse on a 100MB blob and then ran a complex .reduce() on the resulting array. While that .reduce() was running (about 4 seconds), the server couldn’t respond to a single HTTP request. The health check timed out, the load balancer pulled the node, and the user never got their report.
If you have to do heavy lifting, you have three options:
- Offload to a Worker Thread: Node.js has a
worker_threadsmodule. Use it for CPU-bound tasks like image processing or heavy math. - Chunking: Break the work into smaller pieces using
setImmediate(). This allows the event loop to handle pending I/O between chunks. - Microservices: If a task is consistently heavy, it shouldn’t be in your API server. Move it to a background worker (e.g., BullMQ with Redis).
// Chunking example
function processHugeArray(items) {
const chunk = items.splice(0, 100);
doWork(chunk);
if (items.length > 0) {
setImmediate(() => processHugeArray(items));
}
}
Why setImmediate and not setTimeout(fn, 0)? Because setImmediate is designed to run right after the current “poll” phase of the event loop, making it more efficient for breaking up long-running tasks without adding the minimum 4ms delay that setTimeout often incurs in browsers or the timer overhead in Node.
The “Real World” Gotcha: JSON.parse is a Landmine
This is the one that gets everyone. JSON.parse and JSON.stringify are synchronous. If you are building a high-performance API and you are parsing large payloads, you are blocking the event loop. I’ve seen production systems where 20% of the latency was just JSON.parse.
If you’re dealing with massive JSON files, use a streaming parser like stream-json. It’s more complex to write, but it keeps your memory footprint low and your event loop responsive. For 90% of cases, just being aware of the size of your JSON is enough. If your internal microservice is sending a 50MB JSON response, you have a design problem, not a JavaScript problem.
Observability: If You Can’t Measure It, It’s Broken
The “javascript best” practice for SREs is instrumentation. You need to know what your code is doing inside the black box. This means more than just logs. You need metrics and traces.
In Node.js, you should always export the default V8 metrics. Use the prom-client library to expose an /metrics endpoint for Prometheus. Watch your nodejs_eventloop_lag_seconds. If that number starts climbing, your process is struggling, even if CPU usage looks low. Event loop lag is the “gold standard” metric for Node.js health.
import client from 'prom-client';
const collectDefaultMetrics = client.collectDefaultMetrics;
collectDefaultMetrics({ timeout: 5000 });
// Custom metric to track Stripe API latency
const stripeLatency = new client.Histogram({
name: 'stripe_api_latency_seconds',
help: 'Latency of Stripe API calls',
buckets: [0.1, 0.5, 1, 2, 5]
});
When you have this data, you stop guessing. You don’t say “I think the database is slow.” You say “The 95th percentile of our database queries increased by 200ms following the last deployment.” That is the difference between a junior developer and a senior SRE.
The Myth of “Clean Code” in JavaScript
I’m going to take a stand here: “Clean Code” as defined by Uncle Bob is often a disaster in JavaScript. Deeply nested inheritance, excessive abstraction, and the “one function should only have three lines” rule lead to fragmented codebases that are impossible to trace. In the JS world, readability and “grep-ability” are king.
I would much rather see a 50-line function that clearly describes a business process from top to bottom than five different files with three-line functions that I have to jump between using “Go to Definition” just to understand how a user logs in. JavaScript’s strength is its functional leanings. Use them. Keep your logic flat. Avoid this whenever possible—it’s a context-shifting nightmare that leads to more bugs than it solves.
Summary of the “Done with Hype” Checklist
- Node.js Version: Stay on LTS (Long Term Support). Don’t run production on odd-numbered versions (like Node 21). They are experimental. Use Node 20 or 22.
- ESM vs CommonJS: Just move to ESM (
"type": "module"). It’s 2024. The ecosystem has mostly caught up, and the tree-shaking benefits are real. - Environment Variables: Use a library like
dotenvor the native--env-fileflag in Node 20.6+. Never, ever hardcode a URL likelocalhost:3000. - Security: Use
helmetfor Express/Fastify. It sets sensible HTTP headers. It takes 5 seconds to implement and prevents a dozen basic attack vectors. - Testing: Don’t aim for 100% coverage. It’s a vanity metric. Aim for 100% coverage of your “happy paths” and 100% coverage of the “fuck-up” scenarios you’ve actually encountered in production.
- CI/CD: If your build takes more than 5 minutes, you’re doing it wrong. Cache your
node_modules. Parallelize your tests. A slow CI is a direct tax on developer productivity.
JavaScript is a powerful, messy, brilliant tool. It’s the duct tape of the internet. But if you treat it like a toy, it will bite you. Stop following the hype cycles of the latest framework and start focusing on the fundamentals: memory, the event loop, and observability. Your on-call rotation will thank you.
Stop over-engineering your abstractions and start engineering your reliability; because at the end of the day, the user doesn’t care if you used a Monad, they care that the page loaded.
Related Articles
Explore more insights and best practices: