About
About ML Systems Review
Why a group of engineers started an independent publication about production machine learning in 2023 — and how we keep it running.
ML Systems Review (MLSR) is an independent engineering publication founded in May 2023 and covering production machine learning systems — architecture, benchmarks, reliability, and failure modes. The masthead is five engineers and researchers with graduate degrees from Stanford, CMU, Berkeley, and Oxford, and a decade-plus of combined production ML experience across startups, mid-sized tech, and consulting. We take no sponsorships, run no affiliate links, and review every article for technical accuracy before publication.
Why ML Systems Review exists
In early 2023, when the four of us first started trading drafts on a shared Google Doc, the gap we kept noticing was not a shortage of machine-learning content. It was the opposite. Every week another preprint dropped, another blog post made the rounds, another Twitter thread claimed a breakthrough. The problem was that almost none of it spoke to the engineers actually running models in production.
Academic ML has its venues — NeurIPS, ICML, ICLR — and they are excellent at what they do. Consumer tech has its venues too, and they are excellent at selling products. What almost no one was writing was the middle: the part where a trained model has to run on a user's phone at 60 frames per second, or where a feature store has to serve 200,000 lookups per second without breaking the p99 latency budget, or where a data-drift alert fires at 3 a.m. on a Sunday and the on-call engineer has to decide whether to roll back.
That middle layer — production ML systems — is where most of the real engineering work happens, and it is where failures are most expensive. It deserves a publication that treats it with the same seriousness that IEEE Software brings to programming languages or that distill.pub brought to interpretability research. ML Systems Review is our attempt.
What we publish
Our beat is production machine learning. In practice, that means four overlapping categories:
- Architecture case studies. How real companies structured real systems. Figma's multiplayer engine, Discord's shift toward Rust, Plaid's bank-integration API. Sometimes based on published engineering blogs, sometimes on conference talks, sometimes on interviews with engineers willing to go on background.
- Benchmarks and comparisons. Not marketing benchmarks. Real measurements with documented methodology — variance, sample size, confidence intervals. If we report a number, we also report how we got it.
- MLOps and reliability. The unglamorous work: monitoring, retraining, data drift, feature pipelines, on-call rotation. The parts that make the difference between a model that works in a notebook and a model that works on a Tuesday afternoon in production.
- Post-mortems. When production ML systems fail publicly — Zillow's iBuy collapse is the canonical case — we try to write up what engineering decisions contributed. Not to assign blame, but because the industry keeps re-learning the same lessons.
The masthead
MLSR has five people on the masthead. All five have worked on production systems at companies where a wrong model prediction cost real money.
Dr. Nadia Volkov, PhD — Editor-in-Chief
PhD in Machine Learning Systems from Stanford (2018), advised by Prof. Christopher Ré on ML infrastructure and weak supervision. Eight years of industry experience since, on production inference systems at mid-sized tech companies. Writes the methodology pieces and sets the editorial direction. knowsAbout: ML systems, distributed inference, benchmark methodology.
Dr. Marcus Brennan, PhD — Deputy Editor
PhD from Carnegie Mellon's Robotics Institute (2017), focused on self-supervised learning for vision. Six years since in industry on applied vision problems — self-supervised pretraining, depth estimation, and visual understanding. Leads our computer-vision coverage. knowsAbout: vision transformers, depth estimation, neural architecture design.
Priya Ramachandran, MS — Staff Writer
MS in Computer Science from UC Berkeley (2019). Spent several years building training and serving platforms at a mid-sized self-driving company, working on distributed training orchestration and checkpoint management. Covers distributed systems, infrastructure, and platform engineering. knowsAbout: distributed training, checkpointing, orchestration.
Lukas Berg, MS — Staff Writer
MS from KTH Royal Institute of Technology. Ten years of production experience across ride-sharing and logistics platforms, owning the training-to-deployment pipeline for perception and forecasting models. Writes our MLOps and reliability pieces. knowsAbout: model monitoring, CI/CD for ML, data-drift detection.
Dr. Theo Nakamura, PhD — Technical Reviewer
PhD in machine learning from Oxford (2016). Senior ML researcher working as an independent consultant, with a background in reinforcement learning and theoretical ML. Reviews every article on the site for technical accuracy before publication. If we claim a number or a mechanism, Theo has pushed back on it. knowsAbout: reinforcement learning, theoretical ML, peer review.
Editorial independence
MLSR is funded by its founders. We do not run display ads, affiliate links, sponsored posts, or "brought to you by" content. We do not accept free hardware for reviews. When we write about a company's system — Figma, Discord, OpenAI, any of them — we do so without their involvement or review, unless we have explicitly arranged a quote, in which case the quote is attributed and the surrounding analysis is ours.
A few concrete commitments that follow from this:
- Every article lists its author and its technical reviewer. Both are accountable for the claims in the piece.
- Numbers are sourced. If we cite a latency figure, the source is in the article — a paper, a talk, a published benchmark, or our own measurement described in full.
- We publish corrections. When we get something wrong (and we do), we update the article inline, mark the correction, and note what changed.
- We decline coverage when there is a conflict of interest. If a masthead member has worked on a system recently, we either disclose the relationship up top or someone else writes the piece.
How an MLSR article gets made
The process is boring, which is the point. An article starts as a pitch in our shared editorial doc. One of us — usually Nadia — decides whether it fits the beat. The author spends a week or two on research: reading papers, scraping benchmarks, sometimes running small experiments. A first draft goes to a colleague for a structural edit, then to Theo (or, for infrastructure pieces, to Priya) for technical review. Reviewers leave inline comments on specific claims; the author either substantiates, softens, or removes them.
Turnaround is slow by internet standards. A typical article takes three to six weeks from pitch to publication. We would rather be late and correct than fast and wrong.
What MLSR is not
A few disclaimers, because readers have asked:
- MLSR is not a research venue. We do not publish novel experimental results with the expectation that they be cited. We summarise, compare, and analyse.
- MLSR is not a news site. If you want to know what OpenAI shipped this morning, read The Information or TechCrunch. We will usually get around to it a month later, after the dust has settled.
- MLSR is not a career-advice blog. There are better places for "how I got a job at FAANG."
Contact
Tips, corrections, and pitches go to editors@mlsystemsreview.com. We read every email. We reply to most, eventually. If you are an engineer with a story about a production ML system — yours or someone else's — and you want to talk on background, that address works for that too.
— Dr. Nadia Volkov, for the ML Systems Review editorial team.