It’s 3:14 AM, the load balancer is throwing 504s like a malfunctioning pitching machine, and I’ve found the culprit in a “clever” one-liner.
I haven’t slept in two days. My eyes feel like someone scrubbed them with industrial-grade sandpaper, and my bloodstream is currently 40% caffeine and 60% pure, unadulterated spite. While most of you were tucked in, dreaming of “clean code” and whatever new framework was released on Twitter four hours ago, I was staring at a terminal screen watching our production environment choke to death.
The system didn’t just crash. It underwent a slow, agonizing heat death. Our Node.js v20.11.0 runtime was gasping for air, the V8 garbage collector was running at 99% CPU utilization trying to reclaim memory that was being held hostage by a series of closures that some “senior” developer thought were elegant.
Table of Contents
1. The 3:00 AM Alert
This is what greeted me. This isn’t a simulation. This is the raw output from the container logs right before the kernel OOM-killed the process:
<--- Last few GCs --->
[14209:0x64c0000] 172345 ms: Mark-sweep 2031.4 (2048.0) -> 2030.1 (2048.0) MB, 1240.5 / 0.0 ms (average mu = 0.124, current mu = 0.002) allocation failure; GC in old space requested
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
1: 0xb83f70 node::Abort() [node]
2: 0xa93f0b [node]
3: 0xd64770 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
4: 0xd64af7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
5: 0xf42265 [node]
6: 0xf42d48 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
7: 0xf1f95e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
8: 0xee0f57 v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
9: 0x12a3f71 v8::internal::Runtime_AllocateInYoungGeneration(int, v8::internal::Address*, v8::internal::Isolate*) [node]
10: 0x1703459 [generated code]
I ran process.memoryUsage() via a diagnostic socket I had to hack into the running process because nobody thought to include a proper telemetry sidecar. The results were a nightmare:
{
rss: 2147483648,
heapTotal: 2056253440,
heapUsed: 2045123584,
external: 157286400,
arrayBuffers: 146800640
}
The heapUsed is sitting at 2.04GB. On a container limited to 2GB. Do the math. The V8 engine’s “Orinoco” garbage collector was trapped in a “stop-the-world” Mark-Sweep-Compact cycle that lasted over a second, only to realize it couldn’t free a single byte. This is what happens when you treat memory like an infinite resource.
2. The Investigation
I started by pulling a heap dump using heapdump. If you’ve never used it, it’s the only way to see the skeletons in the closet.
node --inspect-brk index.js
# In another terminal
kill -USR2 <pid>
I grepped the codebase for the usual suspects. I found a “clever” caching layer that used a plain JavaScript object as a map but never implemented a TTL (Time To Live) or a maximum size. It was just a global variable named requestCache. Every single incoming request’s metadata was being pushed into this object.
When we talk about javascript best practices, we aren’t just talking about making the code look pretty for a PR review; we are talking about keeping the process alive when the heap hits 2GB. The “clever” developer used a closure to “protect” the cache, which meant the garbage collector couldn’t see the references to the massive buffers attached to those request objects.
I spent four hours using grep -r "==" . and grep -r "var" . because apparently, we’ve decided that ECMAScript 2015 never happened and we’re still living in the Wild West of 2012.
3. The Global Scope Graveyard
The first thing I found was a blatant disregard for scope. In Node.js v20.11.0, the global object is not your playground. Someone had written a middleware that attached a “context” object to the global scope to “avoid prop drilling.”
This is architectural malpractice. By attaching data to the global scope, you are essentially telling the V8 engine: “Please never, ever delete this data. I want it to live until the heat death of the universe or until the SREs kill the pod, whichever comes first.”
Before (The Outage-Inducing Garbage):
// middleware/auth.js
app.use((req, res, next) => {
// Missing 'const' or 'let' - automatically becomes global in non-strict mode
// Even in strict mode, some genius did:
global.currentUserContext = {
token: req.headers.authorization,
user: req.body.user, // Massive object
timestamp: Date.now()
};
next();
});
After (The Fix):
// middleware/auth.js
"use strict";
const { AsyncLocalStorage } = require('node:async_hooks');
const authContext = new AsyncLocalStorage();
app.use((req, res, next) => {
const context = {
userId: req.body.user.id, // Only store what you need
timestamp: Date.now()
};
// AsyncLocalStorage provides a safe way to track state across async calls
// without polluting the global namespace or leaking memory between requests.
authContext.run(context, () => {
next();
});
});
The fix involves using AsyncLocalStorage, which is the proper way to handle request-scoped state in modern Node.js. It ensures that when the request is finished, the context is eligible for garbage collection.
4. The Async/Await Suicide Pact
I found a loop in the batch processing service that was using Array.prototype.forEach with an async callback. This is a classic junior mistake that leads to unhandled race conditions and memory spikes. forEach does not wait for promises. It fires them all off simultaneously and then moves on. If you have 10,000 items, you just spawned 10,000 concurrent database connections.
The logs showed the database connection pool screaming for mercy before the Node process finally gave up.
Before (The Outage-Inducing Garbage):
async function processOrders(orders) {
// This fires all promises at once and ignores the return values
orders.forEach(async (order) => {
try {
await db.orders.update(order.id, { status: 'processed' });
} catch (e) {
console.log(e); // Errors are swallowed, no retry logic
}
});
console.log('All orders processed!'); // This lies. They aren't done.
}
After (The Fix):
const { pLimit } = require('p-limit'); // Use a concurrency limiter
async function processOrders(orders) {
const limit = pLimit(10); // Process 10 at a time, don't kill the DB
const tasks = orders.map(order => {
return limit(async () => {
try {
return await db.orders.update(order.id, { status: 'processed' });
} catch (err) {
// Proper error handling and logging
logger.error({ orderId: order.id, err }, 'Failed to process order');
throw err; // Re-throw so Promise.all can catch it
}
});
});
try {
await Promise.all(tasks);
logger.info('Batch processing complete');
} catch (err) {
logger.fatal('Batch processing failed catastrophically');
}
}
By using a concurrency limit and Promise.all, we control the pressure on downstream services and ensure the event loop isn’t flooded with thousands of microtasks in a single tick.
5. The Equality Hallucination
I found a bug in the billing logic where 0 == false was being used to check if a user had a zero balance. In JavaScript, the Abstract Equality Comparison Algorithm is a nightmare. If you use ==, you are asking the engine to perform a series of type conversions that are often counter-intuitive.
In our case, a string "0" coming from a legacy API was being compared to false. Guess what? "0" == false is true. The system thought the user had no balance and skipped the payment gateway. We lost thousands of dollars in revenue because someone was too lazy to type a third equals sign.
Before (The Outage-Inducing Garbage):
function shouldChargeUser(balance) {
// If balance is 0, "0", or false, this logic fails
if (balance == false) {
return false;
}
return true;
}
After (The Fix):
/**
* Strict equality is a non-negotiable "javascript best" practice.
* We check for specific types and values.
*/
function shouldChargeUser(balance) {
if (typeof balance !== 'number') {
throw new TypeError('Balance must be a numeric value');
}
// Strict equality (===) does not perform type coercion
if (balance === 0) {
return false;
}
return balance > 0;
}
Stop using loose equality. There is no excuse. If you want to check for null or undefined, use val == null if you must, but even then, I’d rather see val === null || val === undefined. Be explicit. Code is for humans to read and machines to execute; don’t make the humans guess what the machine will do.
6. The Closure Coffin
This was the hardest one to find. We had a logging utility that was capturing the request object in a closure to provide “contextual logging.” Because the request object contains references to the socket, and the socket contains references to the buffers, we were keeping the entire request-response cycle alive in memory long after the client had disconnected.
V8’s garbage collector works on reachability. If a closure in a setTimeout or a long-running promise chain still has a reference to a variable in its outer scope, that variable stays in the “Old Space.”
Before (The Outage-Inducing Garbage):
function logDelayed(req, message) {
// The 'req' object is now trapped in this closure for 10 seconds
setTimeout(() => {
console.log(`Request ${req.id}: ${message}`);
}, 10000);
}
After (The Fix):
function logDelayed(req, message) {
// Extract only the primitives we need.
// Primitives are copied, not referenced like objects.
const requestId = req.id;
setTimeout(() => {
console.log(`Request ${requestId}: ${message}`);
// The large 'req' object can now be garbage collected
}, 10000);
}
In the “After” example, the large req object is no longer reachable by the callback. The only thing kept in memory is the requestId string. This is the difference between a stable system and one that crashes every Friday at 5:00 PM.
7. The Event Loop Blockade
I saw a JSON.parse call on a 500MB string. In Node.js, the event loop is single-threaded. If you spend 2 seconds parsing a massive JSON object, you are blocking every other request. This is why the load balancer was throwing 504s. The Node process wasn’t dead; it was just busy doing one thing and ignoring the thousands of other connections waiting in the queue.
I used node --inspect and the “Profiler” tab in Chrome DevTools to see the flame graph. A single block of yellow (JavaScript execution) was stretching across the screen like a mountain range.
Before (The Outage-Inducing Garbage):
const fs = require('fs');
function loadConfig() {
// Synchronous file read + Synchronous JSON parse = Event Loop Death
const data = fs.readFileSync('./massive-config.json', 'utf8');
return JSON.parse(data);
}
After (The Fix):
const fs = require('node:fs/promises');
const { pipeline } = require('node:stream/promises');
const JSONStream = require('JSONStream'); // Use a streaming parser
async function loadConfig() {
// Use streams to process data in chunks without blocking the loop
const stream = fs.createReadStream('./massive-config.json');
return new Promise((resolve, reject) => {
const parser = JSONStream.parse('*.items');
let results = [];
stream.pipe(parser)
.on('data', (data) => results.push(data))
.on('end', () => resolve(results))
.on('error', reject);
});
}
If you have to deal with large datasets, use streams. Node.js was built for streaming. If you are loading 500MB into a string, you’ve already lost.
8. The Try/Catch Mirage in Async Loops
I found several instances where try/catch was wrapped around an asynchronous call inside a map function, but the promises weren’t being awaited correctly, or the errors were being swallowed in a way that left the system in an inconsistent state.
When you’re dealing with ECMAScript 2023 features like Promise.allSettled, you have no excuse for not handling every possible failure state.
Before (The Outage-Inducing Garbage):
async function updateInventory(items) {
return items.map(async (item) => {
try {
await api.update(item);
} catch (e) {
// "I'll fix this later" - The developer who now owes me 48 hours of my life
return null;
}
});
}
After (The Fix):
async function updateInventory(items) {
const results = await Promise.allSettled(items.map(item => api.update(item)));
const failures = results.filter(r => r.status === 'rejected');
const successes = results.filter(r => r.status === 'fulfilled');
if (failures.length > 0) {
logger.warn({
failureCount: failures.length,
errors: failures.map(f => f.reason.message)
}, 'Some inventory updates failed');
// Implement actual retry logic or dead-letter queue
await handleFailures(failures);
}
return successes.map(s => s.value);
}
Promise.allSettled is your friend. It ensures that one failure doesn’t blow up the entire batch, but it also gives you a full report of what went wrong.
The Verdict
I am tired. I am cynical. And I am disappointed.
We are running Node.js v20.11.0, one of the most sophisticated execution environments ever built, and we are treating it like a sandbox for toddlers. The V8 engine is a marvel of engineering—it uses speculative optimization, hidden classes, and a multi-generational garbage collector to make JavaScript fast. But it cannot save you from yourself.
If you don’t understand how the heap works, you shouldn’t be writing backend code. If you think == is “fine because I know what’s in the variable,” you are a liability. If you write a closure that captures a 100MB buffer and then wonder why the pod is crashing, you need to go back to basics.
Here is the new reality:
1. Strict Mode is Mandatory: No more implicit globals.
2. No More var: If I see a var in a PR, I will reject it without comment.
3. Memory Profiling: Every new service must have a heap dump analysis as part of its load testing.
4. Linting is Not Optional: Our ESLint config is being tightened. eqeqeq is now an error, not a warning.
5. Streams for Everything: If you are processing more than 1MB of data, you use a stream.
I’m going home. I’m going to turn off my phone, and I’m going to sleep for fourteen hours. When I come back, I expect to see a series of PRs deleting the “clever” code I found tonight.
Don’t talk to me about “velocity” or “shipping fast.” Shipping garbage fast just means you’re creating more work for me at 3:00 AM. Learn the javascript best practices and stick to them, or find another profession where your “creativity” doesn’t result in a 48-hour production outage.
The logs don’t lie. The heap dump doesn’t care about your feelings. Fix the code.
Incident Status: Resolved (Physically).
Mental Status: Critical.
Action Items: Delete the requestCache object immediately. Stop using forEach for async tasks. Use ===.
Signed,
The Architect who is one “unhandledRejection” away from moving to a cabin in the woods.
Related Articles
Explore more insights and best practices: