Informing the broad computing community about current activities, advances and future directions in computer architecture.
For decades, we have designed chips in fundamentally the same way: human intuition applied to a vanishingly small slice of an impossibly large design space. That paradigm worked when Moore’s Law was lifting everything. We could afford to be wrong. We could afford to miss the best design. Process scaling would close the gap.
From Idea Scarcity to Evaluation Scarcity
The central claim is simple, but uncomfortable:
Computer architecture is no longer bottlenecked by ideas. It is bottlenecked by evaluation and telemetry.
For decades, the field has implicitly assumed that ideas are scarce — that the role of the architect is to generate the one clever mechanism worth exploring. Everything else follows. But recent evidence suggests the opposite. With modern large language models and agentic pipelines, hundreds of viable architectural ideas can be generated per day, thousands of candidate designs can be evaluated per week, and design cycles can compress from months to weeks.
This is not speculative. We built a system called the
Gauntlet and tested it on 85 papers from ISCA 2025 and HPCA 2026 — largely outside the model’s training data. Across 475 independent runs, it produced viable architectural mechanisms 95% of the time: independently re-deriving authors’ exact solutions in 48% of cases, and proposing valid alternatives the authors never considered in another 50%. Each took 10–20 minutes. This flips a foundational assumption of the field. If ideas are abundant, then the limiting factor is no longer creativity — it is
which ideas we can evaluate, validate, and trust. This
link has this corpus of problem statement and Gauntlet’s solutions.
1. Evaluation is the new bottleneck
We are moving from a world where the question was “Can we come up with a good idea?” to one where the question becomes “Can we evaluate 10,000 ideas fast enough to find the best one?” This elevates simulation infrastructure, analytical modeling, and verification into the central problems of the field. The “PhD student for three months” implementation bottleneck is already eroding — our system built first-principles performance models from papers in under 20 minutes. What replaces it is a race to build faster, more accurate, and more scalable evaluation pipelines.
2. The telemetry divide
If evaluation becomes central, then ground truth becomes everything. Over time, access to closed-loop deployment telemetry — real workloads, real performance counters, real system behavior at scale, and in low-level depth — may matter as much as architectural insight itself. This creates a risk of structural divide. Academic research, long dependent on proxy benchmarks, could drift further from production reality unless we collectively rethink how we share and access workload data.
3. The end of the old boundary
The traditional separation between “chip company” and “cloud provider” begins to dissolve. Automated architecture requires three tightly coupled capabilities: deployment (to generate telemetry), infrastructure (to evaluate designs at scale), and silicon expertise (to realize designs physically). No single traditional player owns all three. The result is convergence — either through vertical integration or new hybrid ecosystems.
The Deeper Claim
The more provocative claim is not about tools — it is about limits. Human-driven architecture is becoming structurally outmatched by the scale of the design space. This is not a statement about human ability. It is about combinatorics. The architectural search space — spanning parametric and structural choices — is effectively unbounded. Humans sample an infinitesimal fraction of it. That was acceptable in an era of abundance. It is not acceptable in an era where architectural efficiency is the primary lever for progress. The analogy to
AlphaZero is not rhetorical. It is structural: when search, evaluation, and feedback loops become fast enough, intuition gives way to systematic exploration.
What This Means for Research — and Teaching
If this framing is even partially correct, it forces a rethinking of what it means to “do” computer architecture research. Several shifts seem likely. If machines can generate many viable solutions, identifying the *right problem* becomes the scarce intellectual act. Evaluation frameworks, modeling techniques, and telemetry integration may matter more than individual architectural ideas. And the reliance on fixed benchmark suites becomes increasingly fragile in a world driven by dynamic, evolving workloads.
The full paper includes a set of predictions and my opinions on how I see this playing out. This extends to how we teach. Do we still emphasize canonical microarchitectures, or shift toward trade-off reasoning, evaluation frameworks, and interpreting machine-generated designs? What does it mean to train a researcher when idea generation itself is becoming automated?
A Call for Collaboration
This is not a settled direction — it is a hypothesis that needs to be stress-tested by the community. If this resonates (or if you think it is completely wrong), I would love to engage on: new models for teaching architecture, shared evaluation infrastructure and artifacts, privacy-preserving approaches to workload telemetry, and workshops focused on problem formulation rather than solution novelty. If this is even half right, we may need to rethink our identity as a field. Let’s debate it.
About the author: Karthikeyan Sankaralingam is Principal Research Scientist at NVIDIA and Professor at UW-Madison.
Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.