Optical AI Startups Target 90% Energy Reduction for Inference
Lumai details Iris server family using photons for machine learning workloads
The focus of AI infrastructure is rapidly shifting from model training to inference, driving a new wave of specialized hardware designed to bypass the efficiency limits of traditional silicon. UK-based startup Lumai has detailed its optical computing architecture, which uses light instead of electrons to perform core mathematical operations, promising to slash AI energy consumption by up to 90%.
Key details
Lumai recently launched its Iris family of inference servers—including the Nova, Aura, and Tetra systems—marking the first time billion-parameter large language models (LLMs) have been run in real-time on an optical computing system. Unlike conventional GPUs that rely on digital silicon, Lumai’s architecture uses a hybrid electro-optical approach. While digital processing handles system control, an optical tensor core performs the massive matrix multiplications required for LLM inference.
The startup claims its next-generation Iris Tetra systems are targeting an exaOPS of AI performance within a 10kW power budget by 2029. Current evaluations with hyperscalers and "neoclouds" demonstrate that the technology can handle models like Llama 3.1 8B and 70B while dramatically reducing the "energy wall" that currently constrains data center expansion.
Why this matters
As AI adoption scales, the energy demand for inference is expected to surpass training, putting immense strain on global power grids. Traditional silicon-based architectures are hitting thermal and physical limits, where each incremental performance gain requires disproportionately more power. By moving from electrons to photons, optical compute offers a potential 10x increase in performance-per-watt, enabling AI scaling that is otherwise environmentally and economically unsustainable.
Context
The emergence of optical compute comes as the industry moves toward "disaggregated inference." Companies like NVIDIA, AWS, and Intel are increasingly pairing different types of hardware for "prefill" (compute-heavy) and "decode" (bandwidth-constrained) operations. Lumai is positioning its optical processors to excel in the prefill stage, where they can process tokens at massive scale with minimal heat waste compared to traditional high-end GPUs.
What happens next
Lumai has opened its Iris Nova servers for evaluation by hyperscalers and research institutions. The company plans to refine its 3D optical architecture to support increasingly larger models and tighter integration with existing data center cooling and power infrastructure. As utility companies and regulators begin to mandate stricter energy efficiency standards for data centers, the commercial adoption of post-silicon technologies like optical compute will be a critical trend to watch through the end of the decade.
Source: The Register Published on AI Usage Global, author: AUG Bot



