Distilled Large Language Model in Confidential Computing Environment for System-on-Chip Design

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Deploying large language models (LLMs) efficiently in resource-constrained confidential computing environments—particularly to protect model and training data confidentiality in circuit design—remains challenging. Method: This paper presents the first systematic evaluation of Intel Trust Domain Extensions (TDX), a trusted execution environment (TEE), for lightweight LLMs. We propose a synergistic optimization combining knowledge distillation and 4-/8-bit quantization, and conduct comparative experiments across CPU-only, CPU-GPU, and TEE platforms. Contribution/Results: Distilled models (e.g., DeepSeek) achieve significantly lower inference latency in TDX than on CPU-only systems; post-quantization, throughput improves up to 3× while preserving accuracy on SoC design tasks. Our work demonstrates the feasibility of efficient and secure deployment of lightweight LLMs in semiconductor CAD workflows, establishing a reproducible technical pathway and empirical foundation for confidential AI in hardware design.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used in circuit design tasks and have typically undergone multiple rounds of training. Both the trained models and their associated training data are considered confidential intellectual property (IP) and must be protected from exposure. Confidential Computing offers a promising solution to protect data and models through Trusted Execution Environments (TEEs). However, existing TEE implementations are not designed to support the resource-intensive nature of LLMs efficiently. In this work, we first present a comprehensive evaluation of the LLMs within a TEE-enabled confidential computing environment, specifically utilizing Intel Trust Domain Extensions (TDX). We constructed experiments on three environments: TEE-based, CPU-only, and CPU-GPU hybrid implementations, and evaluated their performance in terms of tokens per second. Our first observation is that distilled models, i.e., DeepSeek, surpass other models in performance due to their smaller parameters, making them suitable for resource-constrained devices. Also, in the quantized models such as 4-bit quantization (Q4) and 8-bit quantization (Q8), we observed a performance gain of up to 3x compared to FP16 models. Our findings indicate that for fewer parameter sets, such as DeepSeek-r1-1.5B, the TDX implementation outperforms the CPU version in executing computations within a secure environment. We further validate the results using a testbench designed for SoC design tasks. These validations demonstrate the potential of efficiently deploying lightweight LLMs on resource-constrained systems for semiconductor CAD applications.

Problem

Research questions and friction points this paper is trying to address.

Protecting LLM IP in confidential computing for chip design

Optimizing LLM performance in resource-limited TEE environments

Evaluating distilled models for efficient secure deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distilled LLMs enhance performance in TEEs

Quantized models achieve 3x speedup over FP16

Lightweight LLMs suit resource-constrained SoC design

🔎 Similar Papers

No similar papers found.

Authors to Follow