Learn how to implement production-grade graceful shutdown in Node.js to prevent job loss during deployments. Master signal handling, worker coordination, and zero-downtime patterns.

#Graceful Shutdown in Node.js: Zero Job Loss During Deployments

You deploy a new version of your Node.js application. Everything looks fine. Then you check your logs and discover dozens of failed jobs, incomplete database transactions, and angry users complaining about lost data.

This is the harsh reality of deployments without graceful shutdown. When Kubernetes sends a SIGTERM to your pod, or when you restart your Docker container, your application has seconds to clean up before being forcibly killed. Without proper shutdown handling, in-flight requests are dropped, background jobs are abandoned, and database connections are severed mid-transaction.

The good news? Implementing graceful shutdown isn't complicated. In this guide, we'll build a production-grade shutdown system that handles worker cleanup, ongoing jobs, database connections, and HTTP server draining—all while maintaining zero data loss.

#Why Graceful Shutdown Matters

Modern deployment strategies like rolling updates, blue-green deployments, and autoscaling rely on frequent application restarts. In container orchestration platforms like Kubernetes, pods are constantly created and destroyed based on scaling policies, node maintenance, and deployments.

Here's what happens during a typical Kubernetes deployment without graceful shutdown:

Kubernetes sends SIGTERM to your pod
Your application continues processing jobs
After 30 seconds (default), Kubernetes sends SIGKILL
All in-flight work is immediately terminated
Jobs are lost, connections are broken, data is corrupted

The impact is severe:

Background jobs processing payments are interrupted
API requests return 502 errors mid-processing
Database transactions are rolled back
Message queues show failed jobs requiring manual intervention
Users experience data loss and inconsistent state

According to the Kubernetes documentation on pod termination, the default grace period is 30 seconds. That's your window to clean up gracefully before the nuclear option (SIGKILL) arrives.

#Understanding SIGTERM vs SIGINT

Before we implement shutdown logic, let's understand the signals your application receives:

SIGTERM (Signal Terminate)

Sent by container orchestrators (Kubernetes, Docker, ECS)
Polite request to terminate
Can be caught and handled
Production deployment signal

SIGINT (Signal Interrupt)

Sent when you press Ctrl+C in terminal
Keyboard interrupt during development
Can be caught and handled
Development signal

SIGKILL (Signal Kill)

Cannot be caught or ignored
Immediately terminates the process
Sent after grace period expires
The "nuclear option"

The key principle: handle SIGTERM and SIGINT gracefully to avoid SIGKILL.

// Basic signal handling structure
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));

// Never try to handle SIGKILL - it cannot be caught
// process.on('SIGKILL', ...) // ❌ This won't work

For more details on Node.js signal events, see the official Node.js documentation.

#The Ordered Cleanup Sequence

Graceful shutdown isn't just about closing connections—it's about doing it in the right order. The sequence matters because each component depends on the previous one:

async function gracefulShutdown(signal: string) {
	console.log(`Received ${signal}, starting graceful shutdown`);

	try {
		// Phase 1: Stop accepting new work (parallel - independent operations)
		await Promise.all([stopWorkers(), drainHttpServer()]);

		// Phase 2: Wait for in-flight work to complete
		await waitForOngoingJobs();

		// Phase 3: Close messaging infrastructure
		await closeQueues();

		// Phase 4: Close database connections (must be last - jobs may use DB)
		await closeDatabaseConnections();

		console.log('Graceful shutdown complete');
		process.exit(0);
	} catch (error) {
		console.error('Error during shutdown:', error);
		process.exit(1);
	}
}

Why this order?

Phase 1 (parallel): Stop workers and HTTP server simultaneously—both stop accepting new work
Phase 2: Let current jobs and requests finish processing
Phase 3: Close queue connections to message brokers
Phase 4: Database last—jobs may need DB access until they complete

Let's implement each step in detail.

#Step 1: Stopping BullMQ Workers Gracefully

BullMQ is a popular Node.js queue library built on Redis. When shutting down, we need to stop workers from accepting new jobs while allowing current jobs to finish.

import { Worker, Queue } from 'bullmq';
import IORedis from 'ioredis';

// Create shared Redis connection
const connection = new IORedis({
	host: process.env.REDIS_HOST || 'localhost',
	port: parseInt(process.env.REDIS_PORT || '6379'),
	maxRetriesPerRequest: null,
});

// Track all workers for shutdown coordination
const allWorkers: Worker[] = [];
const allQueues: Queue[] = [];

// Example: Email processing worker
const emailWorker = new Worker(
	'email',
	async (job) => {
		console.log(`Processing email job ${job.id}`);

		// Simulate email sending
		await sendEmail(job.data.to, job.data.subject, job.data.body);

		console.log(`Email job ${job.id} completed`);
	},
	{ connection },
);

// Track worker for shutdown
allWorkers.push(emailWorker);

// Example: Report generation worker
const reportWorker = new Worker(
	'reports',
	async (job) => {
		console.log(`Generating report ${job.id}`);

		// This might take several minutes
		const report = await generateComplexReport(job.data.userId);
		await saveReportToStorage(report);

		console.log(`Report ${job.id} completed`);
	},
	{
		connection,
		concurrency: 5, // Process 5 reports simultaneously
	},
);

allWorkers.push(reportWorker);

// Graceful worker shutdown
async function stopWorkers() {
	console.log(`Closing ${allWorkers.length} workers`);

	// Close all workers in parallel
	// This stops them from accepting new jobs
	await Promise.all(allWorkers.map((worker) => worker.close()));

	console.log('All workers stopped accepting new jobs');
}

What worker.close() does:

Stops pulling new jobs from Redis
Allows currently processing jobs to complete
Resolves when all active jobs are finished
Rejects if timeout is exceeded

According to the BullMQ graceful shutdown documentation, calling worker.close() is the recommended approach for production deployments.

#Step 2: Grace Periods for In-Flight Jobs

Even after closing workers, some jobs might still be processing. We need to give them time to complete before closing the underlying connections.

async function waitForOngoingJobs(timeoutMs: number = 5000) {
	console.log(`Waiting ${timeoutMs}ms for ongoing jobs to complete`);

	await new Promise((resolve) => setTimeout(resolve, timeoutMs));

	console.log('Grace period elapsed');
}

// Alternative: Track active jobs explicitly
class JobTracker {
	private activeJobs = new Set<string>();

	addJob(jobId: string) {
		this.activeJobs.add(jobId);
	}

	removeJob(jobId: string) {
		this.activeJobs.delete(jobId);
	}

	async waitForCompletion(timeoutMs: number = 10000): Promise<boolean> {
		const startTime = Date.now();

		while (this.activeJobs.size > 0) {
			if (Date.now() - startTime > timeoutMs) {
				console.warn(`Timeout: ${this.activeJobs.size} jobs still active`);
				return false;
			}

			await new Promise((resolve) => setTimeout(resolve, 100));
		}

		return true;
	}

	getActiveCount(): number {
		return this.activeJobs.size;
	}
}

// Usage with tracker
const jobTracker = new JobTracker();

const trackedWorker = new Worker(
	'tracked-jobs',
	async (job) => {
		jobTracker.addJob(job.id!);

		try {
			await processJob(job.data);
		} finally {
			jobTracker.removeJob(job.id!);
		}
	},
	{ connection },
);

// In shutdown handler
async function smartWaitForJobs() {
	const completed = await jobTracker.waitForCompletion(15000);

	if (!completed) {
		console.error(`Forced shutdown: ${jobTracker.getActiveCount()} jobs were interrupted`);
	}
}

Choosing the right timeout:

Short jobs (API calls, emails): 5 seconds
Medium jobs (image processing, reports): 15-30 seconds
Long jobs (video encoding, ML training): Consider moving to separate services

Note: Your timeout should be less than Kubernetes' terminationGracePeriodSeconds (default 30s). Leave buffer time for the remaining shutdown steps.

#Step 3: Closing Queues and Redis Connections

After workers are stopped and jobs are complete, close the queue connections and shared Redis clients.

import { Queue, QueueEvents } from 'bullmq';

// Create queues for job submission
const emailQueue = new Queue('email', { connection });
const reportQueue = new Queue('reports', { connection });

allQueues.push(emailQueue, reportQueue);

// Optional: Track queue events
const emailEvents = new QueueEvents('email', { connection });

emailEvents.on('completed', ({ jobId }) => {
	console.log(`Email job ${jobId} completed`);
});

emailEvents.on('failed', ({ jobId, failedReason }) => {
	console.error(`Email job ${jobId} failed: ${failedReason}`);
});

async function closeQueues() {
	console.log(`Closing ${allQueues.length} queues`);

	await Promise.all(allQueues.map((queue) => queue.close()));

	console.log('All queues closed');
}

async function closeRedisConnections() {
	console.log('Closing Redis connections');

	// Close the shared connection last
	await connection.quit();

	// Close event listeners
	await emailEvents.close();

	console.log('Redis connections closed');
}

Important: Always close queues before closing the underlying Redis connection. Closing Redis first will cause errors in queue cleanup.

#Step 4: HTTP Server Drain Pattern

While workers handle background jobs, your HTTP server handles incoming API requests. During shutdown, we need to:

Stop accepting new connections
Allow existing requests to complete
Close the server gracefully

import { serve } from '@hono/node-server';
import { Hono } from 'hono';

const app = new Hono();

// Example: Long-running endpoint
app.post('/api/process', async (c) => {
	const data = await c.req.json();

	// This might take 10 seconds
	const result = await performHeavyComputation(data);

	return c.json({ result });
});

// Start server and keep reference
const server = serve({
	fetch: app.fetch,
	port: parseInt(process.env.PORT || '3000'),
});

console.log('Server running on port 3000');

async function drainHttpServer(timeoutMs: number = 10000) {
	console.log('Draining HTTP server');

	return new Promise<void>((resolve, reject) => {
		// Stop accepting new connections
		server.close((err) => {
			if (err) {
				console.error('Error closing HTTP server:', err);
				reject(err);
			} else {
				console.log('HTTP server closed');
				resolve();
			}
		});

		// Force close after timeout
		setTimeout(() => {
			console.warn('HTTP server drain timeout, forcing close');
			resolve();
		}, timeoutMs);
	});
}

Express.js pattern:

import express from 'express';

const app = express();
const server = app.listen(3000);

async function drainExpressServer() {
	return new Promise<void>((resolve, reject) => {
		server.close((err) => {
			if (err) reject(err);
			else resolve();
		});
	});
}

Fastify pattern:

import Fastify from 'fastify';

const fastify = Fastify({ logger: true });

async function drainFastifyServer() {
	// Fastify has built-in graceful shutdown
	await fastify.close();
}

Fastify has built-in graceful shutdown support, making it one of the cleanest patterns to implement.

#Step 5: Database Connection Cleanup

Database connections should be closed last because all previous steps might need database access to complete jobs or persist state.

import { Pool } from 'pg';

// PostgreSQL connection pool
const pool = new Pool({
	host: process.env.DB_HOST,
	port: parseInt(process.env.DB_PORT || '5432'),
	database: process.env.DB_NAME,
	user: process.env.DB_USER,
	password: process.env.DB_PASSWORD,
	max: 20, // Maximum 20 connections
	idleTimeoutMillis: 30000,
	connectionTimeoutMillis: 2000,
});

async function closeDatabaseConnections() {
	console.log('Closing database connections');

	// Wait for all queries to finish
	await pool.end();

	console.log('Database connections closed');
}

// Prisma pattern
import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

async function closePrisma() {
	await prisma.$disconnect();
}

// Mongoose pattern (MongoDB)
import mongoose from 'mongoose';

async function closeMongoose() {
	await mongoose.connection.close();
}

Connection pool behavior during shutdown:

New connection requests are rejected
Active queries are allowed to complete
Idle connections are closed immediately
Pool waits for active connections to finish
Timeout forces closure if queries hang

#Complete Production Implementation

Here's a complete, production-ready graceful shutdown implementation:

import { Worker, Queue } from 'bullmq';
import { serve } from '@hono/node-server';
import { Hono } from 'hono';
import { Pool } from 'pg';
import IORedis from 'ioredis';

// ============================================================
// INFRASTRUCTURE SETUP
// ============================================================

// Redis connection
const redisConnection = new IORedis({
	host: process.env.REDIS_HOST || 'localhost',
	port: parseInt(process.env.REDIS_PORT || '6379'),
	maxRetriesPerRequest: null,
	retryStrategy: (times) => Math.min(times * 50, 2000),
});

// Database pool
const dbPool = new Pool({
	host: process.env.DB_HOST,
	database: process.env.DB_NAME,
	user: process.env.DB_USER,
	password: process.env.DB_PASSWORD,
	max: 20,
});

// HTTP server
const app = new Hono();

app.get('/health', (c) => c.json({ status: 'healthy' }));

app.post('/api/jobs', async (c) => {
	const data = await c.req.json();

	// Add job to queue
	await jobQueue.add('process', data);

	return c.json({ status: 'queued' });
});

const httpServer = serve({
	fetch: app.fetch,
	port: parseInt(process.env.PORT || '3000'),
});

// ============================================================
// WORKER SETUP
// ============================================================

const allWorkers: Worker[] = [];
const allQueues: Queue[] = [];

// Create queue
const jobQueue = new Queue('jobs', { connection: redisConnection });
allQueues.push(jobQueue);

// Create worker
const jobWorker = new Worker(
	'jobs',
	async (job) => {
		console.log(`Processing job ${job.id}`);

		// Perform database operations
		const client = await dbPool.connect();
		try {
			await client.query('BEGIN');

			// Simulate work
			await performBusinessLogic(job.data);

			await client.query('COMMIT');
		} catch (error) {
			await client.query('ROLLBACK');
			throw error;
		} finally {
			client.release();
		}

		console.log(`Job ${job.id} completed`);
	},
	{
		connection: redisConnection,
		concurrency: 10,
	},
);

allWorkers.push(jobWorker);

console.log('Application started successfully');

// ============================================================
// GRACEFUL SHUTDOWN HANDLER
// ============================================================

let isShuttingDown = false;

async function gracefulShutdown(signal: string) {
	// Prevent multiple shutdown attempts
	if (isShuttingDown) {
		console.log('Shutdown already in progress');
		return;
	}

	isShuttingDown = true;
	console.log(`Received ${signal}, starting graceful shutdown`);

	const shutdownTimeout = setTimeout(() => {
		console.error('Shutdown timeout exceeded, forcing exit');
		process.exit(1);
	}, 25000); // Less than Kubernetes' 30s default

	try {
		// 1. Stop accepting new work
		console.log('Step 1: Closing workers');
		await Promise.all(allWorkers.map((w) => w.close()));
		console.log('✓ Workers closed');

		// 2. Wait for ongoing jobs
		console.log('Step 2: Waiting for ongoing jobs (5s grace period)');
		await new Promise((resolve) => setTimeout(resolve, 5000));
		console.log('✓ Grace period elapsed');

		// 3. Close queues
		console.log('Step 3: Closing queues');
		await Promise.all(allQueues.map((q) => q.close()));
		console.log('✓ Queues closed');

		// 4. Drain HTTP server
		console.log('Step 4: Draining HTTP server');
		await new Promise<void>((resolve) => {
			httpServer.close(() => {
				console.log('✓ HTTP server closed');
				resolve();
			});

			// Force close after 5 seconds
			setTimeout(resolve, 5000);
		});

		// 5. Close database connections
		console.log('Step 5: Closing database connections');
		await dbPool.end();
		console.log('✓ Database connections closed');

		// 6. Close Redis connection
		console.log('Step 6: Closing Redis connection');
		await redisConnection.quit();
		console.log('✓ Redis connection closed');

		clearTimeout(shutdownTimeout);
		console.log('Graceful shutdown complete');
		process.exit(0);
	} catch (error) {
		console.error('Error during graceful shutdown:', error);
		clearTimeout(shutdownTimeout);
		process.exit(1);
	}
}

// Register signal handlers
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));

// Catch uncaught errors
process.on('uncaughtException', (error) => {
	console.error('Uncaught exception:', error);
	gracefulShutdown('uncaughtException');
});

process.on('unhandledRejection', (reason) => {
	console.error('Unhandled rejection:', reason);
	gracefulShutdown('unhandledRejection');
});

// Dummy business logic
async function performBusinessLogic(data: any) {
	await new Promise((resolve) => setTimeout(resolve, 1000));
}

#Testing Shutdown Behavior Locally

Testing graceful shutdown is crucial before deploying to production. Here's how to test locally:

#Manual Testing with Signal Sending

# Terminal 1: Start your application
npm start

# Terminal 2: Find your process ID
pgrep -f "node dist/index.js"
# or
lsof -i :3000

# Send SIGTERM to specific process
kill -SIGTERM <PID>

# Test with SIGINT (Ctrl+C)
# Just press Ctrl+C in Terminal 1

Use kill with -SIGTERM to simulate what Kubernetes and Docker send during deployments. Always target the specific process ID rather than using pkill node, which would terminate all Node.js processes on your machine.

#Docker Testing

# Dockerfile
FROM node:20-alpine

WORKDIR /app

COPY package*.json ./
RUN npm ci --only=production

COPY . .

# Important: Use exec form to ensure signals are forwarded
CMD ["node", "dist/index.js"]

Test shutdown in Docker:

# Build and run
docker build -t myapp .
docker run --name myapp-test -p 3000:3000 myapp

# Send SIGTERM
docker stop myapp-test

# Check logs to verify graceful shutdown
docker logs myapp-test

# Test forced kill (SIGKILL after 10s)
docker stop -t 10 myapp-test

#Kubernetes Testing

# pod.yaml
apiVersion: v1
kind: Pod
metadata:
    name: myapp-test
spec:
    terminationGracePeriodSeconds: 30
    containers:
        - name: app
          image: myapp:latest
          ports:
              - containerPort: 3000
          env:
              - name: NODE_ENV
                value: production
          lifecycle:
              preStop:
                  exec:
                      # Optional: Additional cleanup before SIGTERM
                      command: ['/bin/sh', '-c', 'sleep 5']

Test in Kubernetes:

# Deploy pod
kubectl apply -f pod.yaml

# Watch logs in one terminal
kubectl logs -f myapp-test

# Delete pod in another terminal (triggers SIGTERM)
kubectl delete pod myapp-test

# Verify graceful shutdown in logs

#Automated Testing Script

// test-shutdown.ts
import { spawn, ChildProcess } from 'child_process';

async function testGracefulShutdown() {
	console.log('Starting application...');

	const app = spawn('node', ['dist/index.js'], {
		env: { ...process.env, PORT: '3001' },
	});

	app.stdout.on('data', (data) => {
		console.log(`[APP] ${data}`);
	});

	app.stderr.on('data', (data) => {
		console.error(`[ERROR] ${data}`);
	});

	// Wait for startup
	await new Promise((resolve) => setTimeout(resolve, 3000));

	console.log('Sending SIGTERM...');
	app.kill('SIGTERM');

	// Track shutdown duration
	const startTime = Date.now();

	app.on('exit', (code, signal) => {
		const duration = Date.now() - startTime;

		console.log(`\nShutdown completed in ${duration}ms`);
		console.log(`Exit code: ${code}`);
		console.log(`Signal: ${signal}`);

		if (code === 0 && duration < 25000) {
			console.log('✓ Graceful shutdown successful');
		} else {
			console.error('✗ Graceful shutdown failed');
			process.exit(1);
		}
	});
}

testGracefulShutdown().catch(console.error);

Run the test:

npx tsx test-shutdown.ts

#Production Considerations

#Monitoring Shutdown Metrics

Track these metrics to ensure graceful shutdown is working:

import { Counter, Histogram } from 'prom-client';

const shutdownDurationHistogram = new Histogram({
	name: 'shutdown_duration_seconds',
	help: 'Time taken for graceful shutdown',
	buckets: [1, 5, 10, 15, 20, 25, 30],
});

const jobsInterruptedCounter = new Counter({
	name: 'jobs_interrupted_total',
	help: 'Number of jobs interrupted during shutdown',
});

async function monitoredShutdown(signal: string) {
	const startTime = Date.now();

	try {
		await gracefulShutdown(signal);

		const duration = (Date.now() - startTime) / 1000;
		shutdownDurationHistogram.observe(duration);
	} catch (error) {
		jobsInterruptedCounter.inc();
		throw error;
	}
}

#Kubernetes Readiness Probes

Stop receiving traffic before shutdown using Kubernetes readiness probes:

apiVersion: apps/v1
kind: Deployment
metadata:
    name: myapp
spec:
    template:
        spec:
            containers:
                - name: app
                  image: myapp:latest
                  readinessProbe:
                      httpGet:
                          path: /health/ready
                          port: 3000
                      initialDelaySeconds: 5
                      periodSeconds: 5
                  livenessProbe:
                      httpGet:
                          path: /health/live
                          port: 3000
                      initialDelaySeconds: 15
                      periodSeconds: 10

Implement health endpoints:

let isReady = true;

app.get('/health/ready', (c) => {
	if (!isReady) {
		return c.json({ status: 'not ready' }, 503);
	}
	return c.json({ status: 'ready' });
});

app.get('/health/live', (c) => {
	return c.json({ status: 'alive' });
});

// In shutdown handler, mark as not ready first
async function gracefulShutdown(signal: string) {
	isReady = false; // Stop receiving new traffic
	await new Promise((resolve) => setTimeout(resolve, 5000)); // Let Kubernetes update

	// Continue with normal shutdown...
}

#Logging and Observability

Add structured logging with pino to track shutdown progress:

import pino from 'pino';

const logger = pino({
	level: process.env.LOG_LEVEL || 'info',
	formatters: {
		level: (label) => ({ level: label }),
	},
});

async function gracefulShutdown(signal: string) {
	logger.warn({ signal }, 'Starting graceful shutdown');

	try {
		logger.info('Closing workers');
		await stopWorkers();
		logger.info({ workerCount: allWorkers.length }, 'Workers closed');

		logger.info('Waiting for jobs');
		await waitForJobs();
		logger.info('Jobs completed');

		// ... rest of shutdown

		logger.info('Graceful shutdown complete');
		process.exit(0);
	} catch (error) {
		logger.error({ error }, 'Shutdown error');
		process.exit(1);
	}
}

#Key Takeaways

Handle SIGTERM and SIGINT to avoid forceful SIGKILL termination
Order matters: Stop workers → wait for jobs → close queues → drain HTTP → close database
Set timeouts less than Kubernetes' grace period (default 30s)
Use worker.close() from BullMQ for proper worker shutdown
Track active jobs explicitly for better visibility
Test locally with Docker and signal sending before production
Monitor metrics to catch shutdown issues early
Update readiness probes to stop receiving traffic during shutdown

Graceful shutdown is not optional for production Node.js applications. With proper implementation, you can achieve zero job loss during deployments, maintain data consistency, and provide a seamless experience for your users — even during the chaos of continuous deployment.

Ready to level up your Node.js reliability? Check out our guide on BullMQ step-based state machines for resilient job processing, explore handling child job failures for advanced error recovery patterns, or learn about production-ready structured logging with Pino for better shutdown observability.