Skip to content
Sitebard AI
Machine Learning

Model Distillation

A technique that trains a smaller, faster AI model to replicate the outputs of a larger, more capable one, enabling high-quality performance at lower cost and with reduced compute.

By Sitebard TeamUpdated May 28, 2026

In plain English

Model distillation trains a small, fast AI model to behave like a large, slow one. The result is a cheaper model that's nearly as capable, ideal for running at scale or on devices with limited compute.

Technical definition

In knowledge distillation, a student network is trained to minimize a weighted combination of the standard task loss and a distillation loss computed against the teacher's soft probability outputs. The soft targets carry richer information than one-hot labels, enabling the student to learn the teacher's learned representations more efficiently than training on ground truth alone.

Business use case

An e-commerce platform uses a large LLM to generate high-quality product descriptions. They distil the large model into a smaller one that runs 10x faster and at one-fifth of the cost, deploying the distilled model in production while keeping the teacher for quality checks.

Example

Meta's LLaMA series, Mistral, and many fine-tuned models that appear on open-source leaderboards are distilled or influenced by distillation from larger proprietary models.

Frequently asked questions

Model distillation is a process where a small 'student' model learns to mimic the behaviour of a large 'teacher' model by training on the teacher's outputs, capturing much of its knowledge at a fraction of the size.

Smaller models are cheaper and faster to run. Distillation lets companies deploy capable AI at lower inference cost, on edge devices, or within strict latency requirements without retraining from scratch on raw data.

Generally not quite as capable, but distilled models often retain 90% or more of the performance on specific tasks at a fraction of the parameter count. The right trade-off depends on the task requirements and budget.

Both compress models, but differently. Distillation trains a structurally smaller model. Quantization reduces the numerical precision of an existing model's weights. They are often combined for maximum efficiency.

Keep exploring

View all

Put AI intelligence to work in your business

Sitebard AI brings together the data, guides, and career intelligence you need to make confident AI decisions.