Live GPU
—
COMPUTE %
MEM BW %
TEMPERATURE
POWER
VRAM Allocation
Default phase
Optimized phase
Settling / idle
GPU during benchmark phases
No benchmark data yet
Configure the servers in the left panel and click Run Benchmark. Results update live.
Key Metrics
Latency Breakdown
End-to-End Latency (ms) — lower is better
Time to First Token / TTFT (ms) — lower is better
Distributions
Latency Distribution
All requests sorted ascending, both servers
Default
Optimized
TTFT Distribution
Time-to-first-token per request, sorted ascending
Default
Optimized
Percentiles & Throughput
Latency Percentiles (ms)
P50 · P90 · P99
Default
Optimized
Throughput
req/s & tok/s ÷ 10 — higher is better
Default
Optimized
Scheduler State
Running / waiting / swapped sequences
Default
Optimized
Prometheus / vLLM Engine Metrics
Active Configuration
Live Log
Log output will appear here when a benchmark or config apply runs.