Performance Testing & Benchmarking
Test your Order Accuracy pipeline performance on various hardware configurations. This guide covers everything from quick performance checks to comprehensive system capacity testing.
Quick Start (5 minutes)
Goal: Run a basic performance test to verify your system works correctly
1. Initialize Performance Tools
make update-submodules
2. Run Quick Benchmark
Dine-In:
cd dine-in
make benchmark
Take-Away:
cd take-away
make benchmark
What this does:
- Tests GPU/CPU performance for order validation
- Measures end-to-end latency
- Generates performance metrics
- Outputs results to results/ directory
Understanding Benchmark Types
Dine-In Benchmarks
Single Request Benchmark
make benchmark
Tests single image validation latency: - Image preprocessing time - VLM inference time - Semantic matching time - Total end-to-end latency
Stream Density Benchmark
make benchmark-density
Finds maximum concurrent requests the system can handle under latency constraints: - Target latency threshold (configurable) - Progressive load increase - Identifies performance ceiling
Take-Away Benchmarks
Single Video Benchmark
make benchmark
Tests end-to-end latency for single order validation: - Video upload time - Frame extraction time - VLM inference latency - Validation time - Total processing time
Fixed Workers Benchmark
make benchmark-oa BENCHMARK_WORKERS=4 BENCHMARK_DURATION=300
Tests system with fixed number of concurrent workers: - Throughput (orders/minute) - Latency percentiles (P50, P95, P99) - GPU utilization - Memory usage
Stream Density Benchmark
make benchmark-stream-density
Finds maximum sustainable worker count under latency constraints: - Maximum concurrent workers - Latency at each worker count - Point of degradation - Resource utilization at capacity
Environment Variables Reference
Dine-In Configuration
| Variable | Default | Description |
|---|---|---|
TARGET_LATENCY_MS |
15000 | Target latency threshold (ms) |
LATENCY_METRIC |
avg | 'avg', 'p95', or 'max' |
DENSITY_INCREMENT |
1 | Concurrent images per iteration |
INIT_DURATION |
60 | Warmup time (seconds) |
MIN_REQUESTS |
3 | Min requests before measuring |
REQUEST_TIMEOUT |
300 | Individual request timeout (seconds) |
API_ENDPOINT |
http://localhost:8083 | API endpoint URL |
RESULTS_DIR |
./results | Results output directory |
Take-Away Configuration
| Variable | Default | Description |
|---|---|---|
TARGET_LATENCY_MS |
25000 | Target latency threshold (ms) |
LATENCY_METRIC |
avg | 'avg', 'p95', or 'max' |
WORKER_INCREMENT |
1 | Workers added per iteration |
INIT_DURATION |
10 | Warmup time (seconds) |
MIN_TRANSACTIONS |
3 | Min transactions before measuring |
MAX_ITERATIONS |
50 | Max scaling iterations |
MAX_WAIT_SEC |
600 | Max wait per iteration (seconds) |
BENCHMARK_WORKERS |
1 | Number of workers (fixed mode) |
BENCHMARK_DURATION |
60 | Test duration (seconds) |
Hardware Testing Commands
GPU Performance Testing
Dine-In:
# Ensure GPU device is configured in .env
# OPENVINO_DEVICE=GPU
make benchmark
Take-Away:
# Configure GPU in .env
# OPENVINO_DEVICE=GPU
make benchmark-oa BENCHMARK_WORKERS=4
Multi-Worker Stress Testing (Take-Away)
# Test with 2 parallel workers
make up-parallel WORKERS=2
make benchmark-oa BENCHMARK_WORKERS=2
# High stress test with 8 workers
make up-parallel WORKERS=8
make benchmark-oa BENCHMARK_WORKERS=8
Progressive Load Testing
# Automatically find maximum sustainable workers
make benchmark-stream-density \
BENCHMARK_TARGET_LATENCY_MS=25000 \
BENCHMARK_WORKER_INCREMENT=1 \
BENCHMARK_MAX_ITERATIONS=20
Viewing Results
Dine-In Results
# View density benchmark results
make benchmark-density-results
# View raw results
cat results/benchmark_results.json
ls -la results/
Take-Away Results
# View benchmark results
make benchmark-oa-results
# View density results
cat results/stream_density_results.json
ls -la results/
Consolidate Metrics
make consolidate-metrics
cat results/metrics_summary.csv
Expected Performance
Typical Latency Ranges
| Operation | Dine-In | Take-Away |
|---|---|---|
| Image Preprocessing | 100-500ms | N/A |
| Frame Selection | N/A | 200-500ms |
| VLM Inference | 5-10s | 5-10s |
| Semantic Matching | 50-200ms | 50-200ms |
| Total End-to-End | 8-15s | 8-15s per order |
Hardware Impact
| Configuration | Typical Performance |
|---|---|
| CPU Only | 15-25s per validation |
| Intel iGPU | 8-15s per validation |
| Intel Arc dGPU | 5-10s per validation |
| NVIDIA RTX | 4-8s per validation |
Throughput Expectations
| Mode | Expected Throughput |
|---|---|
| Dine-In Single | 4-6 orders/minute |
| Take-Away Single | 4-6 orders/minute |
| Take-Away Parallel (4 workers) | 16-24 orders/minute |
| Take-Away Parallel (8 workers) | 30-40 orders/minute |
Optimization Tips
GPU Utilization
- Monitor GPU usage with
nvidia-smi -l 1orintel_gpu_top - Target 70-90% GPU utilization for optimal throughput
- If GPU is underutilized, increase worker count
Memory Management
- Monitor container memory with
docker stats - VLM models require 8-16GB GPU memory
- Reduce batch size if out-of-memory errors occur
Network Optimization (Take-Away)
- Use wired connections for RTSP streams
- Ensure 1Gbps+ network bandwidth per camera
- Consider local video storage for testing
Latency Reduction
- Use INT8 model quantization
- Enable HTTP/2 for API connections
- Pre-warm VLM model before benchmarking
Troubleshooting Performance Issues
Low FPS / High Latency
- Check GPU driver installation
- Verify OPENVINO_DEVICE setting in .env
- Reduce image resolution or batch size
- Check for thermal throttling
VLM Timeout Errors
- Increase API_TIMEOUT in .env
- Check GPU memory availability
- Consider using smaller model precision
Memory Exhaustion
- Reduce number of parallel workers
- Lower batch size settings
- Monitor with
docker stats
Inconsistent Results
- Increase warmup duration (INIT_DURATION)
- Increase minimum transactions (MIN_TRANSACTIONS)
- Run multiple benchmark iterations