🤖 AI Summary
Deploying large language models (LLMs) efficiently in resource-constrained confidential computing environments—particularly to protect model and training data confidentiality in circuit design—remains challenging.
Method: This paper presents the first systematic evaluation of Intel Trust Domain Extensions (TDX), a trusted execution environment (TEE), for lightweight LLMs. We propose a synergistic optimization combining knowledge distillation and 4-/8-bit quantization, and conduct comparative experiments across CPU-only, CPU-GPU, and TEE platforms.
Contribution/Results: Distilled models (e.g., DeepSeek) achieve significantly lower inference latency in TDX than on CPU-only systems; post-quantization, throughput improves up to 3× while preserving accuracy on SoC design tasks. Our work demonstrates the feasibility of efficient and secure deployment of lightweight LLMs in semiconductor CAD workflows, establishing a reproducible technical pathway and empirical foundation for confidential AI in hardware design.
📝 Abstract
Large Language Models (LLMs) are increasingly used in circuit design tasks and have typically undergone multiple rounds of training. Both the trained models and their associated training data are considered confidential intellectual property (IP) and must be protected from exposure. Confidential Computing offers a promising solution to protect data and models through Trusted Execution Environments (TEEs). However, existing TEE implementations are not designed to support the resource-intensive nature of LLMs efficiently. In this work, we first present a comprehensive evaluation of the LLMs within a TEE-enabled confidential computing environment, specifically utilizing Intel Trust Domain Extensions (TDX). We constructed experiments on three environments: TEE-based, CPU-only, and CPU-GPU hybrid implementations, and evaluated their performance in terms of tokens per second.
Our first observation is that distilled models, i.e., DeepSeek, surpass other models in performance due to their smaller parameters, making them suitable for resource-constrained devices. Also, in the quantized models such as 4-bit quantization (Q4) and 8-bit quantization (Q8), we observed a performance gain of up to 3x compared to FP16 models. Our findings indicate that for fewer parameter sets, such as DeepSeek-r1-1.5B, the TDX implementation outperforms the CPU version in executing computations within a secure environment. We further validate the results using a testbench designed for SoC design tasks. These validations demonstrate the potential of efficiently deploying lightweight LLMs on resource-constrained systems for semiconductor CAD applications.