TALAN: Task-Aligned Latent Adaptation Networks for Targeted Post-Training of Large Language Models

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work proposes TALAN, a sequence-conditioned latent bypass network embedded within the Transformer residual stream, designed to enhance large language models’ performance on reasoning, mathematical, and coding tasks without compromising their general capabilities. TALAN uniquely integrates input-aware, lightweight activation intervention with low-rank adapters such as LoRA and DoRA, enabling task-aligned, token-level perturbations through a single supervised fine-tuning pass. Its core innovations include latent memory compression, token-level perturbation mixing, and controlled residual updates, supported by a six-dimensional configurable architecture. Experiments across four Qwen3 variants and four STEM/code benchmarks demonstrate consistent improvements over LoRA and DoRA baselines, yielding average gains of 1.41–1.85 points while introducing less than 1% additional trainable parameters and approximately 1% inference overhead.

📝 Abstract

Targeted post-training aims to improve reasoning, math, and code without degrading strengths. Low-rank adapters are efficient but task-global; activation interventions are input-aware but often require separate probes, vectors, or inference-time steering. We introduce TALAN (Task-Aligned Latent Adaptation Networks), a sequence-conditioned latent side path inserted into a transformer's residual stream and co-trained with a low-rank adapter in one SFT loop. TALAN compresses the active sequence into latent memory, remixes it into token-level perturbations, and writes them back through a controlled residual update. It is configured along six axes: insertion location, memory size, mixer, writeback rule, trainability scope, and gradient scale. Across four Qwen3-family backbones and four STEM/code benchmarks, TALAN improves matched LoRA and DoRA baselines. With LoRA, it yields a +1.41 point cross-model mean gain, positive on all four backbones and non-negative on all 16 model-benchmark cells. With DoRA, it yields a +1.85 point mean gain, positive on all backbones and on 13 of 16 cells. Paired seed checks support positive average effects but show nontrivial variance, so we treat them as sensitivity checks. Cost is small: <1% trainable parameters relative to the backbone and 1.01-1.02x inference overhead versus matched LoRA. A Llama-3.2-1B transfer probe is also positive under LoRA and rsLoRA across seven paired seeds, supporting a transfer beyond Qwen. Internal-state analyses suggest TALAN is a small complementary activation intervention. The matched adapter update is 80-1,700x larger than the TALAN perturbation, yet their directions have near-zero cosine; per-layer measurements show this small orthogonal perturbation propagates and amplifies through depth. TALAN offers a practical platform for studying steerable activation-level adaptation within standard adapter-based post-training.

Problem

Research questions and friction points this paper is trying to address.

targeted post-training

large language models

low-rank adaptation

activation intervention

task alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-Aligned Adaptation

Latent Side Path

Activation Intervention