TR2025-114

Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents

- Lewis, A., White, M., Liu, J., Koike-Akino, T., Parsons, K., Wang, Y., "Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents", ACL 2025 workshop on Generation, Evaluation & Metrics (GEM), Ofir Arviv, Miruna Clinciu, Kaustubh Dhole, Rotem Dror, Sebastian Gehrmann, Eliya Habba, Itay Itzhak, Simon Mille, Yotam Perlitz, Enrico Santus, João Sedoc, Michal Shmueli Scheuer, Gabriel Stanovsky, Oyvind Tafjord, Eds., July 2025, pp. 705-727.
  BibTeX TR2025-114 PDF
  - @inproceedings{Lewis2025jul2,
  - author = {Lewis, Ashley and White, Michael and Liu, Jing and Koike-Akino, Toshiaki and Parsons, Kieran and Wang, Ye},
  - title = {{Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents}},
  - booktitle = {ACL 2025 workshop on Generation, Evaluation \& Metrics (GEM)},
  - year = 2025,
  - editor = {Ofir Arviv, Miruna Clinciu, Kaustubh Dhole, Rotem Dror, Sebastian Gehrmann, Eliya Habba, Itay Itzhak, Simon Mille, Yotam Perlitz, Enrico Santus, João Sedoc, Michal Shmueli Scheuer, Gabriel Stanovsky, Oyvind Tafjord},
  - pages = {705--727},
  - month = jul,
  - publisher = {Association for Computational Linguistics},
  - isbn = {979-8-89176-261-9},
  - url = {https://www.merl.com/publications/TR2025-114}
  - }
MERL Contacts:
Research Areas:

Artificial Intelligence, Machine Learning

Abstract:

The deployment of Large Language Models (LLMs) in customer support is constrained by hallucination—generating false information—and the high cost of proprietary models. To address these challenges, we propose a retrieval-augmented question-answering (QA) pipeline and explore how to balance human input and automation. Using a dataset of questions about a Samsung Smart TV user manual, we demonstrate that synthetic data generated by LLMs outperforms crowdsourced data in reducing hallucination in finetuned models. We also compare self-training (fine-tuning models on their own outputs) and knowledge distillation (fine-tuning on stronger models’ out- puts, e.g., GPT-4o), and find that self-training achieves comparable hallucination reduction. We conjecture that this surprising finding can be attributed to increased exposure bias issues in the knowledge distillation case and support this conjecture with post hoc analysis. We also improve robustness to unanswerable questions and retrieval failures with contextualized “I don’t know” responses. These findings show that scalable, cost-efficient QA systems can be built using synthetic data and self-training with open-source models, reducing reliance on proprietary tools or costly human annotations.

Related Research Highlights

Private, Secure, and Reliable Artificial Intelligence

Related Publications

Lewis, A., White, M., Liu, J., Koike-Akino, T., Parsons, K., Wang, Y., "Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents", arXiv, July 2025.

BibTeX arXiv

@article{Lewis2025jul,
author = {Lewis, Ashley and White, Michael and Liu, Jing and Koike-Akino, Toshiaki and Parsons, Kieran and Wang, Ye},
title = {{Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents}},
journal = {arXiv},
year = 2025,
month = jul,
url = {https://arxiv.org/abs/2502.19545}
}

Lewis, A., White, M., Liu, J., Koike-Akino, T., Parsons, K., Wang, Y., "Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents", arXiv, February 2025.

BibTeX arXiv

@article{Lewis2025feb,
author = {Lewis, Ashley and White, Michael and Liu, Jing and Koike-Akino, Toshiaki and Parsons, Kieran and Wang, Ye},
title = {{Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents}},
journal = {arXiv},
year = 2025,
month = feb,
url = {https://www.arxiv.org/abs/2502.19545}
}

TR2025-114

Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents

MERL Contacts:

Jing
Liu

Toshiaki
Koike-Akino

Kieran
Parsons

Ye
Wang

Research Areas:

Abstract:

Related Research Highlights

Related Publications

MERL Contacts:

JingLiu

ToshiakiKoike-Akino

KieranParsons

YeWang

Research Areas:

Abstract:

Jing
Liu

Toshiaki
Koike-Akino

Kieran
Parsons

Ye
Wang