An educational tool designed to help Azure support engineers practice diagnosing common Node.js performance problems. It intentionally generates controllable performance issues that mimic real-world scenarios.
- CPU Stress - Generate high CPU usage using child processes (
child_process.fork()) - Memory Pressure - Allocate and retain memory to simulate leaks with stacking behavior
- Event Loop Blocking - Block the Node.js event loop with synchronous operations
- Slow Requests - Multiple blocking patterns: setTimeout, libuv thread pool saturation, worker threads
- Crash Simulation - Trigger FailFast, stack overflow, unhandled exceptions, or OOM
- Real-time Dashboard - Monitor metrics with live charts via WebSocket (Socket.IO)
# Clone the repository
git clone https://github.com/rhamlett/PerfSimNode.git
cd PerfSimNode
# Install dependencies
npm install
# Start in development mode
npm run dev
# Or build and run in production
npm run build
npm startThe server starts on http://localhost:3000 by default.
The application runs as a single Node.js process with Express.js, Socket.IO for real-time metrics, and in-memory state (no persistence).
src/
βββ index.ts # Entry point
βββ app.ts # Express app setup
β
βββ controllers/ # API endpoints
β βββ admin.controller.ts
β βββ cpu.controller.ts
β βββ crash.controller.ts
β βββ eventloop.controller.ts
β βββ health.controller.ts
β βββ memory.controller.ts
β βββ metrics.controller.ts
β βββ slow.controller.ts
β
βββ services/ # Business logic
β βββ cpu-stress.service.ts
β βββ crash.service.ts
β βββ event-log.service.ts
β βββ eventloop-block.service.ts
β βββ memory-pressure.service.ts
β βββ metrics.service.ts
β βββ simulation-tracker.service.ts
β βββ slow-request.service.ts
β
βββ middleware/ # Express middleware
β βββ error-handler.ts
β βββ request-logger.ts
β βββ validation.ts
β
βββ types/ # TypeScript interfaces
β βββ index.ts
β
βββ utils/ # Utility functions
β βββ index.ts
β
βββ config/ # Configuration
β βββ index.ts
β
βββ public/ # Static dashboard
βββ index.html # Main dashboard
βββ docs.html # Documentation page
βββ azure-diagnostics.html # Azure diagnostics guide
βββ favicon.svg
βββ css/
β βββ styles.css
βββ js/
βββ charts.js # Chart.js integration
βββ dashboard.js # UI interactions
βββ socket-client.js # Socket.IO client
Understanding these Node.js-specific behaviors is essential for diagnosing performance issues:
Node.js runs JavaScript on a single thread. All I/O operations are asynchronous, but CPU-intensive synchronous code blocks the entire event loop:
- Blocked event loop = No requests processed, WebSocket heartbeats fail, health checks timeout
- High CPU in child processes = May not affect latency (work is isolated)
- Memory pressure = Triggers garbage collection pauses, increasing event loop lag
| Aspect | Node.js | .NET Core |
|---|---|---|
| Concurrency Model | Single thread + async I/O | Thread pool (many threads) |
| CPU-intensive work | Blocks all requests (unless in child process) | Affects individual threads |
| Scaling approach | Cluster mode / child processes | More threads |
| Memory per instance | Lower baseline (~30-50MB) | Higher baseline (~100-200MB) |
- JIT Compilation - Code is optimized at runtime; initial requests may be slower
- Garbage Collection - Automatic but can cause pauses (visible as event loop lag spikes)
- Heap Limit - Default ~1.5GB on 64-bit; configurable via
--max-old-space-size
Implementation: Uses child_process.fork() to spawn separate OS processes that run crypto.pbkdf2Sync() in a tight loop. This ensures actual CPU utilization without blocking the main event loop.
Key characteristic: Server stays responsive during CPU stress - work is isolated in child processes.
curl -X POST http://localhost:3000/api/simulations/cpu \
-H "Content-Type: application/json" \
-d '{"targetLoadPercent": 75, "durationSeconds": 30}'| Parameter | Range | Description |
|---|---|---|
| targetLoadPercent | 1-100 | Target CPU usage (spawns proportional workers) |
| durationSeconds | 1-300 | How long to run the simulation |
Implementation: Allocates Buffer objects filled with random data, held until explicitly released. Multiple allocations stack.
# Allocate memory
curl -X POST http://localhost:3000/api/simulations/memory \
-H "Content-Type: application/json" \
-d '{"sizeMb": 100}'
# Release memory (use the returned ID)
curl -X DELETE http://localhost:3000/api/simulations/memory/{id}| Parameter | Range | Description |
|---|---|---|
| sizeMb | 1-500 | Memory to allocate in megabytes |
Implementation: Performs synchronous crypto.pbkdf2Sync() directly in the main thread, blocking ALL async operations.
β οΈ Warning: Server becomes completely unresponsive. Dashboard freezes. WebSocket may disconnect.
Key insight: Unlike CPU stress (child processes), this blocks THE thread. Event Loop Lag equals block duration.
curl -X POST http://localhost:3000/api/simulations/eventloop \
-H "Content-Type: application/json" \
-d '{"durationSeconds": 5}'Symptoms to Observe:
- Event loop lag spikes to blocking duration
- ALL requests queue and complete together after unblock
- Probe dots turn red in dashboard
- Dashboard metrics stop updating during block
Implementation: Three blocking patterns available:
setTimeout- Non-blocking delay (server stays responsive)libuv- Saturates libuv thread pool (affects fs/dns operations)worker- Spawns blocking worker threads (similar to .NET ThreadPool)
Key difference from Event Loop Blocking: With non-blocking patterns, only the slow endpoint is affected. Health probes and other requests complete normally.
# Non-blocking (default)
curl "http://localhost:3000/api/simulations/slow?delaySeconds=10"
# libuv thread pool saturation
curl "http://localhost:3000/api/simulations/slow?delaySeconds=10&blockingPattern=libuv"
# Worker thread blocking
curl "http://localhost:3000/api/simulations/slow?delaySeconds=10&blockingPattern=worker"Intentionally crashes the Node.js process for testing crash recovery.
β οΈ Warning: These operations terminate the process. Azure App Service auto-restarts.
| Type | Endpoint | Effect |
|---|---|---|
| FailFast | /crash/failfast |
Immediate SIGABRT, core dump |
| Stack Overflow | /crash/stackoverflow |
Call stack exceeded |
| Exception | /crash/exception |
Unhandled exception |
| OOM | /crash/memory |
Memory exhaustion |
# Unhandled exception
curl -X POST http://localhost:3000/api/simulations/crash/exception
# Memory exhaustion (OOM)
curl -X POST http://localhost:3000/api/simulations/crash/memory| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | Health check with uptime |
/api/metrics/probe |
GET | Lightweight probe for latency monitoring |
/api/metrics |
GET | Current system metrics |
/api/simulations |
GET | List active simulations |
/api/simulations/cpu |
POST | Start CPU stress (child processes) |
/api/simulations/cpu/:id |
DELETE | Stop CPU stress |
/api/simulations/memory |
POST | Allocate memory |
/api/simulations/memory/:id |
DELETE | Release memory |
/api/simulations/eventloop |
POST | Block event loop |
/api/simulations/slow |
GET | Slow request |
/api/simulations/crash/failfast |
POST | FailFast (SIGABRT) |
/api/simulations/crash/stackoverflow |
POST | Stack overflow |
/api/simulations/crash/exception |
POST | Unhandled exception |
/api/simulations/crash/memory |
POST | Memory exhaustion |
/api/admin/status |
GET | Admin status |
/api/admin/events |
GET | Event log |
/api/admin/system-info |
GET | System info (CPUs, memory, SKU) |
Connect via Socket.IO to receive real-time updates:
| Event | Frequency | Description |
|---|---|---|
metrics |
1000ms | System metrics (CPU, memory, event loop) |
probeLatency |
250ms / 2500ms | Request latency measurements |
event |
On occurrence | Simulation and system events |
simulation |
On status change | Simulation state updates |
Probe frequency automatically increases to 2500ms during slow request testing for cleaner diagnostics.
| Environment Variable | Default | Description |
|---|---|---|
PORT |
3000 | HTTP server port |
METRICS_INTERVAL_MS |
1000 | Metrics broadcast interval |
MAX_SIMULATION_DURATION_SECONDS |
300 | Maximum simulation duration |
MAX_MEMORY_ALLOCATION_MB |
500 | Maximum memory allocation |
The application is designed for Azure App Service Linux with GitHub Actions OIDC deployment.
Quick Start:
- Create an App Service with Node.js 24 LTS and Linux
- Enable WebSockets in Configuration β General settings
- Set up GitHub OIDC authentication (no secrets needed!)
- Push to
mainbranch to deploy
π See docs/azure-deployment.md for the complete step-by-step guide covering:
- App Service creation (Portal and CLI)
- Azure AD App Registration for GitHub OIDC
- Federated credentials configuration
- GitHub secrets setup
- Troubleshooting
The application includes a comprehensive Azure Diagnostics Guide accessible at /azure-diagnostics.html when running. It covers:
- Understanding metrics (CPU, memory, event loop lag, latency)
- Node.js vs .NET concurrency model differences
- Step-by-step diagnostic workflows for each simulation
- Azure App Service Diagnostics, Application Insights, and Kudu
- Linux diagnostic tools and commands
- Ready-to-use AppLens/KQL queries
# Run in development with hot reload
npm run dev
# Run tests
npm test
# Run linting
npm run lint
# Format code
npm run formatMIT
Created by SpecKit in collaboration with Richard Hamlett