🤖 AI Summary
This work addresses the limited domain expertise and multimodal comprehension of large language models in additive manufacturing by proposing a low-cost, efficient domain adaptation strategy. Building upon the Gemma-3 instruction-tuning architecture, the approach leverages a domain-specific dataset comprising approximately 50 million tokens from open-access additive manufacturing publications, integrating domain-adaptive pretraining, vision-instruction tuning, and multimodal fusion techniques to develop the first multimodal large language model tailored for additive manufacturing. The study also introduces Additive-Manufacturing-Benchmark, the first dedicated evaluation benchmark for this domain. Experimental results demonstrate that the proposed model achieves over 90% accuracy on general additive manufacturing knowledge tasks and exhibits strong performance across both linguistic and visual modalities.
📝 Abstract
This work presents AdditiveLLM2 a multi-modal, domain adapted large language model built upon the instruction tuned variant of the Gemma 3 model using a relatively small dataset of around 50 million tokens. The dataset (AdditiveLLM2-OA) consists of open-access additive manufacturing journal articles with data extracted for the domain adaptive pretraining and visual instruction tuning processes. Various stages of the developed model are evaluated with the Additive-Manufacturing-Benchmark which consists of additive manufacturing domain specific tasks compiled published resources. AdditiveLLM2 exhibits proficiency in both language and vision based tasks, achieving accuracies upwards of 90% in general additive manufacturing knowledge. This domain adaptive pretraining and instruction tuning strategy outline an accessible specialization method for large language models to a domain such as additive manufacturing.