BENCHMARK REPORT · ENGINE V0.2.0 · 2026-05-20 · APPLE SILICON M3

Performance

0.38 µs

Engine p50

Tier 1 isolated matching

2.5M ops/sec

Throughput

Engine isolation (see Table 1)

16.9 ms

E2E p50

Single client, loopback

180 / 1800

Tests

failures

Tier 1 — Engine Microbenchmark · Pure in-process matching loop

Engine-layer throughput — ops/sec (isolation benchmark, see Table 1)

Latency breakdown — µs per operation

Engine component cost breakdown — ns per operation

Tier 2 — End-to-End HTTP Latency · Full request lifecycle, 127.0.0.1 loopback

End-to-end latency — p50 and p99 (ms) · Vela loopback scenarios only · See Table 2 for Hyperliquid system figures

Scenario metrics

Scenario	p50	p99	p99.9	p99.99	Throughput
S1Single-threaded sequential	16.9 ms	19.9 ms	25.2 ms	87.0 ms	59 ops/sec
S2Concurrent-32	159 ms	307 ms	235 ms	240 ms	192 ops/sec
S3Burst-1000	3,871 ms	5,444 ms	—	—	181 ops/sec

Tier 3 — Sustained Throughput · 300 seconds · 5 market-maker agents · 11 markets

Throughput time series — ops/sec per 5-second bucket

The ~40% throughput drop at t=120 s is expected: each MM agent exhausts its resting-order queue within two minutes. The dominant operation shifts from new orders (1 round-trip) to cancel-repost (2 round-trips), halving raw request count while preserving equivalent trading activity. Throughput stabilizes at ~130 ops/sec.

Memory usage (RSS)

Memory grows ~17 MB over 300 seconds with 5 active market-makers. No unbounded growth observed.

300-second summary

49,129

Total operations

4,717

Total fills

9.60%

Fill rate

156 ops/sec

Peak throughput

134 ops/sec

Mean throughput

85.6%

Throughput stability

35.6 ms

p50 match latency

63.1 ms

p99 match latency

Table 1 — Engine-layer isolation · Both measured in isolation, no consensus, no networking, no signature verification on hot path

Engine	Hardware	Throughput ceiling	Methodology
Vela	Apple M3	2,500,000 ops/sec	Criterion.rs, release build
Pulse	Apple M2 Pro	125,000 ops/sec¹	Published isolation benchmark

M3 vs M2 Pro is the only hardware confound. ¹ Pulse published figure, not independently verified.

Table 2 — System-level reference · Hyperliquid published figures · Includes HyperBFT consensus and real networking · Not directly comparable to Table 1

Metric	Hyperliquid (published)	Includes
Execution layer ceiling	200,000 ops/sec	Custom HyperCore binary + HyperBFT
Consensus throughput (theoretical)	>1,000,000 ops/sec	HyperBFT documented upper bound
End-to-end p50 (colocated)	200 ms	Full round-trip incl. 2-of-3 BFT validator round-trips
End-to-end p99 (colocated)	900 ms	Full round-trip incl. BFT consensus

Hyperliquid figures from official Hyperliquid documentation. Vela has no comparable system-level figure yet — these will be published once a consensus layer and real-networking benchmark exist.

Scope and limitations

These benchmarks prove the matching algorithm is fast in isolation. They do not include BFT consensus, real TCP networking, signature verification on the hot path, deep book state, large MPT state, or sustained load beyond 5 minutes. The system-level benchmark suite will be published separately once a consensus layer exists. The Pulse comparison (Table 1) is methodology-equivalent and the most credible cross-engine figure here. The Hyperliquid figures (Table 2) are system-level and not directly comparable to any figure in this report.