🤖 AI Summary
To address the challenges of excessive parameter counts, overfitting susceptibility, and high hardware overhead in deep neural networks, this paper proposes a general model compression framework based on Automatically Differentiable Deep Tensor Networks (ADTN). Unlike conventional matrix- or tensor-based representations, ADTN unifies diverse neural network layers—including linear and convolutional layers—into a deep, differentiable tensor structure. It achieves exponential parameter reduction via high-order tensor decomposition and end-to-end joint optimization. On VGG-16, linear-layer parameters are compressed from millions to just 424, while CIFAR-10 accuracy improves by 1.57%. The method’s generality and efficiency are further validated on LeNet-5 and AlexNet across MNIST and CIFAR datasets. Key contributions include: (i) the first deep automatically differentiable tensor network architecture; (ii) simultaneous achievement of extreme compression, enhanced generalization, and hardware efficiency; and (iii) a unified, end-to-end trainable framework applicable across diverse network topologies and tasks.
📝 Abstract
Neural network (NN) designed for challenging machine learning tasks is in general a highly nonlinear mapping that contains massive variational parameters. High complexity of NN, if unbounded or unconstrained, might unpredictably cause severe issues including R{overfitting}, loss of generalization power, and unbearable cost of hardware. In this work, we propose a general compression scheme that significantly reduces the variational parameters of NN's, despite of their specific types (linear, convolutional, extit{etc}), by encoding them to deep R{automatically differentiable} tensor network (ADTN) that contains exponentially-fewer free parameters. Superior compression performance of our scheme is demonstrated on several widely-recognized NN's (FC-2, LeNet-5, AlextNet, ZFNet and VGG-16) and datasets (MNIST, CIFAR-10 and CIFAR-100). For instance, we compress two linear layers in VGG-16 with approximately $10^{7}$ parameters to two ADTN's with just 424 parameters, improving the testing accuracy on CIFAR-10 from $90.17%$ to $91.74%$. We argue that the deep structure of ADTN is an essential reason for the remarkable compression performance of ADTN, compared to existing compression schemes that are mainly based on tensor decompositions/factorization and shallow tensor networks. Our work suggests deep TN as an exceptionally efficient mathematical structure for representing the variational parameters of NN's, which exhibits superior compressibility over the commonly-used matrices and multi-way arrays.