This study investigates the capability of small language models (SLMs) to generate multi-tiered, high-fidelity fake news headlines under explicit prompting and their evasion potential against existing detection methods. Using controlled prompt engineering, we systematically evaluate 24,000 fake headlines generated by 14 SLMs, employing DistilBERT-based and ensemble classifiers for both quality grading and authenticity classification. Results show that SLMs reliably follow instructions to produce both high- and low-quality fake headlines; however, their outputs exhibit statistically significant semantic and stylistic divergence from authentic news headlines. Crucially, state-of-the-art detectors achieve only 35.2%–63.5% accuracy in identifying these SLM-generated fakes, revealing critical robustness gaps. This work constitutes the first systematic empirical analysis demonstrating the controllability, quality tunability, and detection vulnerability of SLMs in disinformation generation—providing foundational evidence and methodological guidance for developing resilient content safety mechanisms.