Skip to content

Performance Testing & Benchmarking

Test your Order Accuracy pipeline performance on various hardware configurations. This guide covers everything from quick performance checks to comprehensive system capacity testing.

Quick Start (5 minutes)

Goal: Run a basic performance test to verify your system works correctly

1. Initialize Performance Tools

make update-submodules

2. Run Quick Benchmark

Dine-In:

cd dine-in
make benchmark

Take-Away:

cd take-away
make benchmark

What this does: - Tests GPU/CPU performance for order validation - Measures end-to-end latency - Generates performance metrics - Outputs results to results/ directory

Understanding Benchmark Types

Dine-In Benchmarks

Single Request Benchmark

make benchmark

Tests single image validation latency: - Image preprocessing time - VLM inference time - Semantic matching time - Total end-to-end latency

Stream Density Benchmark

make benchmark-density

Finds maximum concurrent requests the system can handle under latency constraints: - Target latency threshold (configurable) - Progressive load increase - Identifies performance ceiling

Take-Away Benchmarks

Single Video Benchmark

make benchmark

Tests end-to-end latency for single order validation: - Video upload time - Frame extraction time - VLM inference latency - Validation time - Total processing time

Fixed Workers Benchmark

make benchmark-oa BENCHMARK_WORKERS=4 BENCHMARK_DURATION=300

Tests system with fixed number of concurrent workers: - Throughput (orders/minute) - Latency percentiles (P50, P95, P99) - GPU utilization - Memory usage

Stream Density Benchmark

make benchmark-stream-density

Finds maximum sustainable worker count under latency constraints: - Maximum concurrent workers - Latency at each worker count - Point of degradation - Resource utilization at capacity

Environment Variables Reference

Dine-In Configuration

Variable Default Description
TARGET_LATENCY_MS 15000 Target latency threshold (ms)
LATENCY_METRIC avg 'avg', 'p95', or 'max'
DENSITY_INCREMENT 1 Concurrent images per iteration
INIT_DURATION 60 Warmup time (seconds)
MIN_REQUESTS 3 Min requests before measuring
REQUEST_TIMEOUT 300 Individual request timeout (seconds)
API_ENDPOINT http://localhost:8083 API endpoint URL
RESULTS_DIR ./results Results output directory

Take-Away Configuration

Variable Default Description
TARGET_LATENCY_MS 25000 Target latency threshold (ms)
LATENCY_METRIC avg 'avg', 'p95', or 'max'
WORKER_INCREMENT 1 Workers added per iteration
INIT_DURATION 10 Warmup time (seconds)
MIN_TRANSACTIONS 3 Min transactions before measuring
MAX_ITERATIONS 50 Max scaling iterations
MAX_WAIT_SEC 600 Max wait per iteration (seconds)
BENCHMARK_WORKERS 1 Number of workers (fixed mode)
BENCHMARK_DURATION 60 Test duration (seconds)

Hardware Testing Commands

GPU Performance Testing

Dine-In:

# Ensure GPU device is configured in .env
# OPENVINO_DEVICE=GPU
make benchmark

Take-Away:

# Configure GPU in .env
# OPENVINO_DEVICE=GPU
make benchmark-oa BENCHMARK_WORKERS=4

Multi-Worker Stress Testing (Take-Away)

# Test with 2 parallel workers
make up-parallel WORKERS=2
make benchmark-oa BENCHMARK_WORKERS=2

# High stress test with 8 workers
make up-parallel WORKERS=8
make benchmark-oa BENCHMARK_WORKERS=8

Progressive Load Testing

# Automatically find maximum sustainable workers
make benchmark-stream-density \
  BENCHMARK_TARGET_LATENCY_MS=25000 \
  BENCHMARK_WORKER_INCREMENT=1 \
  BENCHMARK_MAX_ITERATIONS=20

Viewing Results

Dine-In Results

# View density benchmark results
make benchmark-density-results

# View raw results
cat results/benchmark_results.json
ls -la results/

Take-Away Results

# View benchmark results
make benchmark-oa-results

# View density results
cat results/stream_density_results.json
ls -la results/

Consolidate Metrics

make consolidate-metrics
cat results/metrics_summary.csv

Expected Performance

Typical Latency Ranges

Operation Dine-In Take-Away
Image Preprocessing 100-500ms N/A
Frame Selection N/A 200-500ms
VLM Inference 5-10s 5-10s
Semantic Matching 50-200ms 50-200ms
Total End-to-End 8-15s 8-15s per order

Hardware Impact

Configuration Typical Performance
CPU Only 15-25s per validation
Intel iGPU 8-15s per validation
Intel Arc dGPU 5-10s per validation
NVIDIA RTX 4-8s per validation

Throughput Expectations

Mode Expected Throughput
Dine-In Single 4-6 orders/minute
Take-Away Single 4-6 orders/minute
Take-Away Parallel (4 workers) 16-24 orders/minute
Take-Away Parallel (8 workers) 30-40 orders/minute

Optimization Tips

GPU Utilization

  • Monitor GPU usage with nvidia-smi -l 1 or intel_gpu_top
  • Target 70-90% GPU utilization for optimal throughput
  • If GPU is underutilized, increase worker count

Memory Management

  • Monitor container memory with docker stats
  • VLM models require 8-16GB GPU memory
  • Reduce batch size if out-of-memory errors occur

Network Optimization (Take-Away)

  • Use wired connections for RTSP streams
  • Ensure 1Gbps+ network bandwidth per camera
  • Consider local video storage for testing

Latency Reduction

  • Use INT8 model quantization
  • Enable HTTP/2 for API connections
  • Pre-warm VLM model before benchmarking

Troubleshooting Performance Issues

Low FPS / High Latency

  • Check GPU driver installation
  • Verify OPENVINO_DEVICE setting in .env
  • Reduce image resolution or batch size
  • Check for thermal throttling

VLM Timeout Errors

  • Increase API_TIMEOUT in .env
  • Check GPU memory availability
  • Consider using smaller model precision

Memory Exhaustion

  • Reduce number of parallel workers
  • Lower batch size settings
  • Monitor with docker stats

Inconsistent Results

  • Increase warmup duration (INIT_DURATION)
  • Increase minimum transactions (MIN_TRANSACTIONS)
  • Run multiple benchmark iterations