Benchmarks
Benchmarks
Benchmark-focused articles with documented methodology. When we report a number, we also report how we got it — sample sizes, hardware, variance, and the caveats the marketing usually omits.
-
Apple M4 Max first NPU benchmarks: tflops per watt analysis
ViT-L/16 forward-pass latency and tflops-per-watt measurements on the M4 Max 38 TOPS Neural Engine, with M3 Max and RTX 4090 comparisons.
By Lukas Berg ·
-
Inside PlateLens's Calorie-Accuracy Claim: A Technical Replication
End-to-end accuracy benchmark against MyFitnessPal, Cronometer, Foodvisor, and Bitesnap on 418 professionally plated meals.
By Dr. Marcus Brennan ·
-
On-device vs cloud inference: per-million-inference cost across six targets
Cost, battery, and latency measurements for ANE, Hexagon, Tensor G3, Inferentia2, TPU v5e, and L4.
By Lukas Berg ·
-
Rust serving p50/p99 vs Python: a tokenizer and inference overhead benchmark
p50 and p99 request overhead for FastAPI, axum + tch-rs, and axum + Candle serving stacks.
By Lukas Berg ·
-
Edge ML inference: iPhone vs Android TFLite benchmarks
Latency and accuracy across Core ML and TFLite paths on iPhone 14 Pro and Pixel 7 Pro.
By Dr. Marcus Brennan ·
-
Production-scale vision transformers: cost per inference in 2025
ViT-B, ViT-L, and ViT-H cost per million inferences across AWS, GCP, and on-premises hardware.
By Priya Ramachandran ·
-
Depth estimation from single RGB images: state of 2025
Benchmark comparison of MiDaS 3.1, ZoeDepth, DepthAnything, and Marigold on NYU Depth V2 and in-the-wild datasets.
By Dr. Marcus Brennan ·
-
Why accuracy benchmarks mislead: variance, sample size, methodology
How to read published accuracy numbers and what to check before trusting a benchmark.
By Dr. Nadia Volkov ·