Model Distillation
A technique that trains a smaller, faster AI model to replicate the outputs of a larger, more capable one, enabling high-quality performance at lower cost and with reduced compute.
In plain English
Model distillation trains a small, fast AI model to behave like a large, slow one. The result is a cheaper model that's nearly as capable, ideal for running at scale or on devices with limited compute.
Technical definition
In knowledge distillation, a student network is trained to minimize a weighted combination of the standard task loss and a distillation loss computed against the teacher's soft probability outputs. The soft targets carry richer information than one-hot labels, enabling the student to learn the teacher's learned representations more efficiently than training on ground truth alone.
Business use case
An e-commerce platform uses a large LLM to generate high-quality product descriptions. They distil the large model into a smaller one that runs 10x faster and at one-fifth of the cost, deploying the distilled model in production while keeping the teacher for quality checks.
Example
Meta's LLaMA series, Mistral, and many fine-tuned models that appear on open-source leaderboards are distilled or influenced by distillation from larger proprietary models.
Frequently asked questions
Model distillation is a process where a small 'student' model learns to mimic the behaviour of a large 'teacher' model by training on the teacher's outputs, capturing much of its knowledge at a fraction of the size.
Smaller models are cheaper and faster to run. Distillation lets companies deploy capable AI at lower inference cost, on edge devices, or within strict latency requirements without retraining from scratch on raw data.
Generally not quite as capable, but distilled models often retain 90% or more of the performance on specific tasks at a fraction of the parameter count. The right trade-off depends on the task requirements and budget.
Both compress models, but differently. Distillation trains a structurally smaller model. Quantization reduces the numerical precision of an existing model's weights. They are often combined for maximum efficiency.
Keep exploring
Machine Learning
Machine learning is a way for computers to learn from examples instead of being told exact rules. The more relevant data they see, the better they get at making predictions.
Large Language Model
A large language model is an AI trained on huge amounts of text so it can read your question and write a useful answer. It powers chatbots and writing assistants.
Fine-Tuning
Fine-tuning is taking a model that already knows a lot and giving it extra training on your own examples. This teaches it to do a specific job better, such as writing in your brand's voice.
Put AI intelligence to work in your business
Sitebard AI brings together the data, guides, and career intelligence you need to make confident AI decisions.