🤖 AI Summary
This work addresses the dual challenges of stylistic adaptation and privacy preservation in personalized email composition. We propose Panza, a localized, lightweight solution designed for on-device execution. Methodologically, we introduce a variant of inverse instruction tuning—Reverse Instructions—integrated with retrieval-augmented generation (RAG) and end-to-end fine-tuning of a local large language model (LLM), enabling low-resource training and inference on consumer-grade hardware (e.g., free-tier Google Colab). To rigorously evaluate personalization, we establish the first dedicated benchmark framework for personalized writing and release the novel ‘David’ dataset—a curated collection of personalized email correspondence. Experiments demonstrate that Panza accurately captures individual writing styles using only a small number of user-specific historical emails. Quantitative ablation studies confirm the independent efficacy and synergistic benefits of RAG and fine-tuning in enhancing personalization performance. All code and datasets are publicly released.
📝 Abstract
The availability of powerful open-source large language models (LLMs) opens exciting use cases, such as automated personal assistants that adapt to the user's unique data and demands. Two key requirements for such assistants are personalization - in the sense that the assistant should reflect the user's own writing style - and privacy - users may prefer to always store their personal data locally, on their own computing device. In this application paper, we present a new design and evaluation for such an automated assistant, for the specific use case of email generation, which we call Panza. Specifically, Panza can be trained and deployed locally on commodity hardware, and is personalized to the user's writing style. Panza's personalization features are based on a combination of fine-tuning using a variant of the Reverse Instructions technique together with Retrieval-Augmented Generation (RAG). We demonstrate that this combination allows us to fine-tune an LLM to better reflect a user's writing style using limited data, while executing on extremely limited resources, e.g. on a free Google Colab instance. Our key methodological contribution is what we believe to be the first detailed study of evaluation metrics for this personalized writing task, and of how different choices of system components - e.g. the use of RAG and of different fine-tuning approaches - impact the system's performance. We are releasing the full Panza code as well as a new"David"personalized email dataset licensed for research use, both available on https://github.com/IST-DASLab/PanzaMail.