LIMO: Low-Power In-Memory-Annealer and Matrix-Multiplication Primitive for Edge Computing

📅 2025-12-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address energy-efficiency bottlenecks in large-scale combinatorial optimization (e.g., Traveling Salesman Problem) and edge-device neural network inference, this paper proposes LIMO: a low-power in-memory annealing architecture. LIMO introduces a novel mixed-signal in-memory annealing macro that integrates stochastic switching of spin-transfer-torque magnetic tunnel junctions (STT-MTJs) with a divide-and-conquer refinement strategy, enhancing state-space exploration and enabling effective escape from local optima. It supports dual-mode computation—annealing-based optimization and vector-matrix multiplication (VMM)—enabling hardware reuse. Evaluated on an 85,900-city TSP instance, LIMO achieves significantly higher solution quality and faster convergence than prior in-memory annealers. For neural inference tasks—including image classification and face detection—it attains software-level accuracy while reducing latency and energy consumption relative to state-of-the-art in-memory computing (CiM) baselines.

Technology Category

Application Category

📝 Abstract

Combinatorial optimization (CO) underpins applications in science and engineering, ranging from logistics to electronic design automation. A classic example is the NP-complete Traveling Salesman Problem (TSP). Finding exact solutions for large-scale TSP instances remains computationally intractable; on von Neumann architectures, such solvers are constrained by the memory wall, incurring compute-memory traffic that grows with instance size. Metaheuristics, such as simulated annealing implemented on compute-in-memory (CiM) architectures, offer a way to mitigate the von Neumann bottleneck. This is accomplished by performing in-memory optimization cycles to rapidly find approximate solutions for TSP instances. Yet this approach suffers from degrading solution quality as instance size increases, owing to inefficient state-space exploration. To address this, we present LIMO, a mixed-signal computational macro that implements an in-memory annealing algorithm with reduced search-space complexity. The annealing process is aided by the stochastic switching of spin-transfer-torque magnetic-tunnel-junctions (STT-MTJs) to escape local minima. For large instances, our macro co-design is complemented by a refinement-based divide-and-conquer algorithm amenable to parallel optimization in a spatial architecture. Consequently, our system comprising several LIMO macros achieves superior solution quality and faster time-to-solution on instances up to 85,900 cities compared to prior hardware annealers. The modularity of our annealing peripherals allows the LIMO macro to be reused for other applications, such as vector-matrix multiplications (VMMs). This enables our architecture to support neural network inference. As an illustration, we show image classification and face detection with software-comparable accuracy, while achieving lower latency and energy consumption than baseline CiM architectures.

Problem

Research questions and friction points this paper is trying to address.

Addresses computational intractability of large-scale combinatorial optimization problems like TSP.

Mitigates von Neumann bottleneck and inefficient state-space exploration in in-memory annealing.

Enables efficient edge computing for optimization and neural network inference with low power.

Innovation

Methods, ideas, or system contributions that make the work stand out.

In-memory annealing with STT-MTJ stochastic switching

Refinement-based divide-and-conquer algorithm for parallel optimization

Modular macro reused for vector-matrix multiplication and neural networks

🔎 Similar Papers

No similar papers found.

Authors to Follow