Measuring Intelligence Efficiency of Local AI
View PDF of the article Intelligence Per Watt: Measuring Intelligence Efficiency in Local AI, by John Saad Falcone and 14 other authors
View PDF HTML (beta)
a summary:Large Language Model (LLM) queries are mostly processed by parametric models in centralized cloud infrastructure. Rapidly increasing demand is straining this model, and cloud providers are struggling to scale infrastructure at a rapid pace. Two advances enable us to rethink this paradigm: small LM devices (<=20 billion active parameters) now achieve performance competitive with leading models on many tasks, and native accelerators (e.g., Apple M4 Max) run these models at interactive latencies. This raises the question: Can local heuristics effectively redistribute demand from centralized infrastructure? Answering this requires measuring whether local LM devices can accurately answer real-world queries and whether they can do so efficiently enough to be practical on power-limited devices (e.g., laptops). We propose intelligence per watt (IPW), task accuracy divided by unit power, as a metric to evaluate the ability and efficiency of local inference across accelerator model pairs. We conduct a large-scale empirical study across more than 20 local edge machines, 8 accelerators, and a representative subset of LLM traffic: 1 million real-world chat and inference queries. For each query, we measure accuracy, energy, response time, and energy. Our analysis reveals results of $3. First, local LMs can accurately answer 88.7% of conversational and single-cycle boolean queries with accuracy varying by domain. Second, from 2023 to 2025, IPW improved by 5.3-fold and local query coverage increased from 23.2% to 71.3%. Third, on-premises accelerators achieve at least 1.4 times lower IPW than cloud accelerators using identical models, revealing significant room for improvement. These results demonstrate that local heuristics can meaningfully redistribute demand from centralized infrastructure, with IPW serving as a critical metric for tracking this shift. We released our set of IPW Profiles to systematically measure the intelligence per watt.
Submission date
From: John Saad Falcon [view email]
[v1]
Tuesday, 11 November 2025, 06:33:30 UTC (5,373 KB)
[v2]
Friday, 14 November 2025 00:53:12 UTC (5,538 KB)
Don’t miss more hot News like this! Click here to discover the latest in AI news!
2025-11-17 05:00:00



