
Graceful Shutdown in Node.js: Zero Job Loss During Deployments
Learn how to implement production-grade graceful shutdown in Node.js to prevent job loss during deployments. Master signal handling, worker coordination, and zero-downtime patterns.
Graceful Shutdown in Node.js: Zero Job Loss During Deployments
You deploy a new version of your Node.js application. Everything looks fine. Then you check your logs and discover dozens of failed jobs, incomplete database transactions, and angry users complaining about lost data.
This is the harsh reality of deployments without graceful shutdown. When Kubernetes sends a SIGTERM to your pod, or when you restart your Docker container, your application has seconds to clean up before being forcibly killed. Without proper shutdown handling, in-flight requests are dropped, background jobs are abandoned, and database connections are severed mid-transaction.
The good news? Implementing graceful shutdown isn't complicated. In this guide, we'll build a production-grade shutdown system that handles worker cleanup, ongoing jobs, database connections, and HTTP server draining—all while maintaining zero data loss.
Why Graceful Shutdown Matters
Modern deployment strategies like rolling updates, blue-green deployments, and autoscaling rely on frequent application restarts. In container orchestration platforms like Kubernetes, pods are constantly created and destroyed based on scaling policies, node maintenance, and deployments.
Here's what happens during a typical Kubernetes deployment without graceful shutdown:
KubernetessendsSIGTERMto your pod- Your application continues processing jobs
- After 30 seconds (default),
KubernetessendsSIGKILL - All in-flight work is immediately terminated
- Jobs are lost, connections are broken, data is corrupted
The impact is severe:
- Background jobs processing payments are interrupted
- API requests return 502 errors mid-processing
- Database transactions are rolled back
- Message queues show failed jobs requiring manual intervention
- Users experience data loss and inconsistent state
According to the Kubernetes documentation on pod termination, the default grace period is 30 seconds. That's your window to clean up gracefully before the nuclear option (SIGKILL) arrives.
Understanding SIGTERM vs SIGINT
Before we implement shutdown logic, let's understand the signals your application receives:
SIGTERM (Signal Terminate)
- Sent by container orchestrators (
Kubernetes,Docker,ECS) - Polite request to terminate
- Can be caught and handled
- Production deployment signal
SIGINT (Signal Interrupt)
- Sent when you press
Ctrl+Cin terminal - Keyboard interrupt during development
- Can be caught and handled
- Development signal
SIGKILL (Signal Kill)
- Cannot be caught or ignored
- Immediately terminates the process
- Sent after grace period expires
- The "nuclear option"
The key principle: handle SIGTERM and SIGINT gracefully to avoid SIGKILL.
// Basic signal handling structure
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));
// Never try to handle SIGKILL - it cannot be caught
// process.on('SIGKILL', ...) // ❌ This won't work
For more details on Node.js signal events, see the official Node.js documentation.
The Ordered Cleanup Sequence
Graceful shutdown isn't just about closing connections—it's about doing it in the right order. The sequence matters because each component depends on the previous one:
async function gracefulShutdown(signal: string) {
console.log(`Received ${signal}, starting graceful shutdown`);
try {
// Phase 1: Stop accepting new work (parallel - independent operations)
await Promise.all([stopWorkers(), drainHttpServer()]);
// Phase 2: Wait for in-flight work to complete
await waitForOngoingJobs();
// Phase 3: Close messaging infrastructure
await closeQueues();
// Phase 4: Close database connections (must be last - jobs may use DB)
await closeDatabaseConnections();
console.log('Graceful shutdown complete');
process.exit(0);
} catch (error) {
console.error('Error during shutdown:', error);
process.exit(1);
}
}
Why this order?
- Phase 1 (parallel): Stop workers and HTTP server simultaneously—both stop accepting new work
- Phase 2: Let current jobs and requests finish processing
- Phase 3: Close queue connections to message brokers
- Phase 4: Database last—jobs may need DB access until they complete
Let's implement each step in detail.
Step 1: Stopping BullMQ Workers Gracefully
BullMQ is a popular Node.js queue library built on Redis. When shutting down, we need to stop workers from accepting new jobs while allowing current jobs to finish.
import { Worker, Queue } from 'bullmq';
import IORedis from 'ioredis';
// Create shared Redis connection
const connection = new IORedis({
host: process.env.REDIS_HOST || 'localhost',
port: parseInt(process.env.REDIS_PORT || '6379'),
maxRetriesPerRequest: null,
});
// Track all workers for shutdown coordination
const allWorkers: Worker[] = [];
const allQueues: Queue[] = [];
// Example: Email processing worker
const emailWorker = new Worker(
'email',
async (job) => {
console.log(`Processing email job ${job.id}`);
// Simulate email sending
await sendEmail(job.data.to, job.data.subject, job.data.body);
console.log(`Email job ${job.id} completed`);
},
{ connection },
);
// Track worker for shutdown
allWorkers.push(emailWorker);
// Example: Report generation worker
const reportWorker = new Worker(
'reports',
async (job) => {
console.log(`Generating report ${job.id}`);
// This might take several minutes
const report = await generateComplexReport(job.data.userId);
await saveReportToStorage(report);
console.log(`Report ${job.id} completed`);
},
{
connection,
concurrency: 5, // Process 5 reports simultaneously
},
);
allWorkers.push(reportWorker);
// Graceful worker shutdown
async function stopWorkers() {
console.log(`Closing ${allWorkers.length} workers`);
// Close all workers in parallel
// This stops them from accepting new jobs
await Promise.all(allWorkers.map((worker) => worker.close()));
console.log('All workers stopped accepting new jobs');
}
What worker.close() does:
- Stops pulling new jobs from
Redis - Allows currently processing jobs to complete
- Resolves when all active jobs are finished
- Rejects if timeout is exceeded
According to the BullMQ graceful shutdown documentation, calling worker.close() is the recommended approach for production deployments.
Step 2: Grace Periods for In-Flight Jobs
Even after closing workers, some jobs might still be processing. We need to give them time to complete before closing the underlying connections.
async function waitForOngoingJobs(timeoutMs: number = 5000) {
console.log(`Waiting ${timeoutMs}ms for ongoing jobs to complete`);
await new Promise((resolve) => setTimeout(resolve, timeoutMs));
console.log('Grace period elapsed');
}
// Alternative: Track active jobs explicitly
class JobTracker {
private activeJobs = new Set<string>();
addJob(jobId: string) {
this.activeJobs.add(jobId);
}
removeJob(jobId: string) {
this.activeJobs.delete(jobId);
}
async waitForCompletion(timeoutMs: number = 10000): Promise<boolean> {
const startTime = Date.now();
while (this.activeJobs.size > 0) {
if (Date.now() - startTime > timeoutMs) {
console.warn(`Timeout: ${this.activeJobs.size} jobs still active`);
return false;
}
await new Promise((resolve) => setTimeout(resolve, 100));
}
return true;
}
getActiveCount(): number {
return this.activeJobs.size;
}
}
// Usage with tracker
const jobTracker = new JobTracker();
const trackedWorker = new Worker(
'tracked-jobs',
async (job) => {
jobTracker.addJob(job.id!);
try {
await processJob(job.data);
} finally {
jobTracker.removeJob(job.id!);
}
},
{ connection },
);
// In shutdown handler
async function smartWaitForJobs() {
const completed = await jobTracker.waitForCompletion(15000);
if (!completed) {
console.error(`Forced shutdown: ${jobTracker.getActiveCount()} jobs were interrupted`);
}
}
Choosing the right timeout:
- Short jobs (API calls, emails): 5 seconds
- Medium jobs (image processing, reports): 15-30 seconds
- Long jobs (video encoding, ML training): Consider moving to separate services
Note: Your timeout should be less than
Kubernetes'terminationGracePeriodSeconds(default 30s). Leave buffer time for the remaining shutdown steps.
Step 3: Closing Queues and Redis Connections
After workers are stopped and jobs are complete, close the queue connections and shared Redis clients.
import { Queue, QueueEvents } from 'bullmq';
// Create queues for job submission
const emailQueue = new Queue('email', { connection });
const reportQueue = new Queue('reports', { connection });
allQueues.push(emailQueue, reportQueue);
// Optional: Track queue events
const emailEvents = new QueueEvents('email', { connection });
emailEvents.on('completed', ({ jobId }) => {
console.log(`Email job ${jobId} completed`);
});
emailEvents.on('failed', ({ jobId, failedReason }) => {
console.error(`Email job ${jobId} failed: ${failedReason}`);
});
async function closeQueues() {
console.log(`Closing ${allQueues.length} queues`);
await Promise.all(allQueues.map((queue) => queue.close()));
console.log('All queues closed');
}
async function closeRedisConnections() {
console.log('Closing Redis connections');
// Close the shared connection last
await connection.quit();
// Close event listeners
await emailEvents.close();
console.log('Redis connections closed');
}
Important: Always close queues before closing the underlying Redis connection. Closing Redis first will cause errors in queue cleanup.
Step 4: HTTP Server Drain Pattern
While workers handle background jobs, your HTTP server handles incoming API requests. During shutdown, we need to:
- Stop accepting new connections
- Allow existing requests to complete
- Close the server gracefully
import { serve } from '@hono/node-server';
import { Hono } from 'hono';
const app = new Hono();
// Example: Long-running endpoint
app.post('/api/process', async (c) => {
const data = await c.req.json();
// This might take 10 seconds
const result = await performHeavyComputation(data);
return c.json({ result });
});
// Start server and keep reference
const server = serve({
fetch: app.fetch,
port: parseInt(process.env.PORT || '3000'),
});
console.log('Server running on port 3000');
async function drainHttpServer(timeoutMs: number = 10000) {
console.log('Draining HTTP server');
return new Promise<void>((resolve, reject) => {
// Stop accepting new connections
server.close((err) => {
if (err) {
console.error('Error closing HTTP server:', err);
reject(err);
} else {
console.log('HTTP server closed');
resolve();
}
});
// Force close after timeout
setTimeout(() => {
console.warn('HTTP server drain timeout, forcing close');
resolve();
}, timeoutMs);
});
}
Express.js pattern:
import express from 'express';
const app = express();
const server = app.listen(3000);
async function drainExpressServer() {
return new Promise<void>((resolve, reject) => {
server.close((err) => {
if (err) reject(err);
else resolve();
});
});
}
Fastify pattern:
import Fastify from 'fastify';
const fastify = Fastify({ logger: true });
async function drainFastifyServer() {
// Fastify has built-in graceful shutdown
await fastify.close();
}
Fastify has built-in graceful shutdown support, making it one of the cleanest patterns to implement.
Step 5: Database Connection Cleanup
Database connections should be closed last because all previous steps might need database access to complete jobs or persist state.
import { Pool } from 'pg';
// PostgreSQL connection pool
const pool = new Pool({
host: process.env.DB_HOST,
port: parseInt(process.env.DB_PORT || '5432'),
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 20, // Maximum 20 connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
async function closeDatabaseConnections() {
console.log('Closing database connections');
// Wait for all queries to finish
await pool.end();
console.log('Database connections closed');
}
// Prisma pattern
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();
async function closePrisma() {
await prisma.$disconnect();
}
// Mongoose pattern (MongoDB)
import mongoose from 'mongoose';
async function closeMongoose() {
await mongoose.connection.close();
}
Connection pool behavior during shutdown:
- New connection requests are rejected
- Active queries are allowed to complete
- Idle connections are closed immediately
- Pool waits for active connections to finish
- Timeout forces closure if queries hang
Complete Production Implementation
Here's a complete, production-ready graceful shutdown implementation:
import { Worker, Queue } from 'bullmq';
import { serve } from '@hono/node-server';
import { Hono } from 'hono';
import { Pool } from 'pg';
import IORedis from 'ioredis';
// ============================================================
// INFRASTRUCTURE SETUP
// ============================================================
// Redis connection
const redisConnection = new IORedis({
host: process.env.REDIS_HOST || 'localhost',
port: parseInt(process.env.REDIS_PORT || '6379'),
maxRetriesPerRequest: null,
retryStrategy: (times) => Math.min(times * 50, 2000),
});
// Database pool
const dbPool = new Pool({
host: process.env.DB_HOST,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 20,
});
// HTTP server
const app = new Hono();
app.get('/health', (c) => c.json({ status: 'healthy' }));
app.post('/api/jobs', async (c) => {
const data = await c.req.json();
// Add job to queue
await jobQueue.add('process', data);
return c.json({ status: 'queued' });
});
const httpServer = serve({
fetch: app.fetch,
port: parseInt(process.env.PORT || '3000'),
});
// ============================================================
// WORKER SETUP
// ============================================================
const allWorkers: Worker[] = [];
const allQueues: Queue[] = [];
// Create queue
const jobQueue = new Queue('jobs', { connection: redisConnection });
allQueues.push(jobQueue);
// Create worker
const jobWorker = new Worker(
'jobs',
async (job) => {
console.log(`Processing job ${job.id}`);
// Perform database operations
const client = await dbPool.connect();
try {
await client.query('BEGIN');
// Simulate work
await performBusinessLogic(job.data);
await client.query('COMMIT');
} catch (error) {
await client.query('ROLLBACK');
throw error;
} finally {
client.release();
}
console.log(`Job ${job.id} completed`);
},
{
connection: redisConnection,
concurrency: 10,
},
);
allWorkers.push(jobWorker);
console.log('Application started successfully');
// ============================================================
// GRACEFUL SHUTDOWN HANDLER
// ============================================================
let isShuttingDown = false;
async function gracefulShutdown(signal: string) {
// Prevent multiple shutdown attempts
if (isShuttingDown) {
console.log('Shutdown already in progress');
return;
}
isShuttingDown = true;
console.log(`Received ${signal}, starting graceful shutdown`);
const shutdownTimeout = setTimeout(() => {
console.error('Shutdown timeout exceeded, forcing exit');
process.exit(1);
}, 25000); // Less than Kubernetes' 30s default
try {
// 1. Stop accepting new work
console.log('Step 1: Closing workers');
await Promise.all(allWorkers.map((w) => w.close()));
console.log('✓ Workers closed');
// 2. Wait for ongoing jobs
console.log('Step 2: Waiting for ongoing jobs (5s grace period)');
await new Promise((resolve) => setTimeout(resolve, 5000));
console.log('✓ Grace period elapsed');
// 3. Close queues
console.log('Step 3: Closing queues');
await Promise.all(allQueues.map((q) => q.close()));
console.log('✓ Queues closed');
// 4. Drain HTTP server
console.log('Step 4: Draining HTTP server');
await new Promise<void>((resolve) => {
httpServer.close(() => {
console.log('✓ HTTP server closed');
resolve();
});
// Force close after 5 seconds
setTimeout(resolve, 5000);
});
// 5. Close database connections
console.log('Step 5: Closing database connections');
await dbPool.end();
console.log('✓ Database connections closed');
// 6. Close Redis connection
console.log('Step 6: Closing Redis connection');
await redisConnection.quit();
console.log('✓ Redis connection closed');
clearTimeout(shutdownTimeout);
console.log('Graceful shutdown complete');
process.exit(0);
} catch (error) {
console.error('Error during graceful shutdown:', error);
clearTimeout(shutdownTimeout);
process.exit(1);
}
}
// Register signal handlers
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));
// Catch uncaught errors
process.on('uncaughtException', (error) => {
console.error('Uncaught exception:', error);
gracefulShutdown('uncaughtException');
});
process.on('unhandledRejection', (reason) => {
console.error('Unhandled rejection:', reason);
gracefulShutdown('unhandledRejection');
});
// Dummy business logic
async function performBusinessLogic(data: any) {
await new Promise((resolve) => setTimeout(resolve, 1000));
}
Testing Shutdown Behavior Locally
Testing graceful shutdown is crucial before deploying to production. Here's how to test locally:
Manual Testing with Signal Sending
# Terminal 1: Start your application
npm start
# Terminal 2: Find your process ID
pgrep -f "node dist/index.js"
# or
lsof -i :3000
# Send SIGTERM to specific process
kill -SIGTERM <PID>
# Test with SIGINT (Ctrl+C)
# Just press Ctrl+C in Terminal 1
Use kill with -SIGTERM to simulate what Kubernetes and Docker send during deployments. Always target the specific process ID rather than using pkill node, which would terminate all Node.js processes on your machine.
Docker Testing
# Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
# Important: Use exec form to ensure signals are forwarded
CMD ["node", "dist/index.js"]
Test shutdown in Docker:
# Build and run
docker build -t myapp .
docker run --name myapp-test -p 3000:3000 myapp
# Send SIGTERM
docker stop myapp-test
# Check logs to verify graceful shutdown
docker logs myapp-test
# Test forced kill (SIGKILL after 10s)
docker stop -t 10 myapp-test
Kubernetes Testing
# pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: myapp-test
spec:
terminationGracePeriodSeconds: 30
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: production
lifecycle:
preStop:
exec:
# Optional: Additional cleanup before SIGTERM
command: ['/bin/sh', '-c', 'sleep 5']
Test in Kubernetes:
# Deploy pod
kubectl apply -f pod.yaml
# Watch logs in one terminal
kubectl logs -f myapp-test
# Delete pod in another terminal (triggers SIGTERM)
kubectl delete pod myapp-test
# Verify graceful shutdown in logs
Automated Testing Script
// test-shutdown.ts
import { spawn, ChildProcess } from 'child_process';
async function testGracefulShutdown() {
console.log('Starting application...');
const app = spawn('node', ['dist/index.js'], {
env: { ...process.env, PORT: '3001' },
});
app.stdout.on('data', (data) => {
console.log(`[APP] ${data}`);
});
app.stderr.on('data', (data) => {
console.error(`[ERROR] ${data}`);
});
// Wait for startup
await new Promise((resolve) => setTimeout(resolve, 3000));
console.log('Sending SIGTERM...');
app.kill('SIGTERM');
// Track shutdown duration
const startTime = Date.now();
app.on('exit', (code, signal) => {
const duration = Date.now() - startTime;
console.log(`\nShutdown completed in ${duration}ms`);
console.log(`Exit code: ${code}`);
console.log(`Signal: ${signal}`);
if (code === 0 && duration < 25000) {
console.log('✓ Graceful shutdown successful');
} else {
console.error('✗ Graceful shutdown failed');
process.exit(1);
}
});
}
testGracefulShutdown().catch(console.error);
Run the test:
npx tsx test-shutdown.ts
Production Considerations
Monitoring Shutdown Metrics
Track these metrics to ensure graceful shutdown is working:
import { Counter, Histogram } from 'prom-client';
const shutdownDurationHistogram = new Histogram({
name: 'shutdown_duration_seconds',
help: 'Time taken for graceful shutdown',
buckets: [1, 5, 10, 15, 20, 25, 30],
});
const jobsInterruptedCounter = new Counter({
name: 'jobs_interrupted_total',
help: 'Number of jobs interrupted during shutdown',
});
async function monitoredShutdown(signal: string) {
const startTime = Date.now();
try {
await gracefulShutdown(signal);
const duration = (Date.now() - startTime) / 1000;
shutdownDurationHistogram.observe(duration);
} catch (error) {
jobsInterruptedCounter.inc();
throw error;
}
}
Kubernetes Readiness Probes
Stop receiving traffic before shutdown using Kubernetes readiness probes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
template:
spec:
containers:
- name: app
image: myapp:latest
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
httpGet:
path: /health/live
port: 3000
initialDelaySeconds: 15
periodSeconds: 10
Implement health endpoints:
let isReady = true;
app.get('/health/ready', (c) => {
if (!isReady) {
return c.json({ status: 'not ready' }, 503);
}
return c.json({ status: 'ready' });
});
app.get('/health/live', (c) => {
return c.json({ status: 'alive' });
});
// In shutdown handler, mark as not ready first
async function gracefulShutdown(signal: string) {
isReady = false; // Stop receiving new traffic
await new Promise((resolve) => setTimeout(resolve, 5000)); // Let Kubernetes update
// Continue with normal shutdown...
}
Logging and Observability
Add structured logging with pino to track shutdown progress:
import pino from 'pino';
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
formatters: {
level: (label) => ({ level: label }),
},
});
async function gracefulShutdown(signal: string) {
logger.warn({ signal }, 'Starting graceful shutdown');
try {
logger.info('Closing workers');
await stopWorkers();
logger.info({ workerCount: allWorkers.length }, 'Workers closed');
logger.info('Waiting for jobs');
await waitForJobs();
logger.info('Jobs completed');
// ... rest of shutdown
logger.info('Graceful shutdown complete');
process.exit(0);
} catch (error) {
logger.error({ error }, 'Shutdown error');
process.exit(1);
}
}
Key Takeaways
- Handle
SIGTERMandSIGINTto avoid forcefulSIGKILLtermination - Order matters: Stop workers → wait for jobs → close queues → drain HTTP → close database
- Set timeouts less than
Kubernetes' grace period (default 30s) - Use
worker.close()fromBullMQfor proper worker shutdown - Track active jobs explicitly for better visibility
- Test locally with
Dockerand signal sending before production - Monitor metrics to catch shutdown issues early
- Update readiness probes to stop receiving traffic during shutdown
Graceful shutdown is not optional for production Node.js applications. With proper implementation, you can achieve zero job loss during deployments, maintain data consistency, and provide a seamless experience for your users — even during the chaos of continuous deployment.
Ready to level up your Node.js reliability? Check out our guide on BullMQ step-based state machines for resilient job processing, explore handling child job failures for advanced error recovery patterns, or learn about production-ready structured logging with Pino for better shutdown observability.