Question 1

What is AI inference?

Accepted Answer

Inference is when a trained model receives a new input and produces an output — for example, a language model generating a reply to a user prompt. It is the 'using' stage, after the 'training' stage.

Question 2

How is inference different from training?

Accepted Answer

Training adjusts the model's weights using large datasets and is computationally intensive and done periodically. Inference uses fixed weights to process new inputs and happens in real time at scale — millions of requests per day.

Question 3

Why does inference cost matter?

Accepted Answer

Inference is the ongoing operational cost of AI. While training is a one-time expense, inference runs continuously as users interact with the model. Optimizing inference speed and cost directly affects product economics.

Question 4

What affects inference speed?

Accepted Answer

Model size, hardware (GPU vs CPU vs specialized accelerators), quantization, caching, batching, and the length of the input and output all affect how fast inference runs and how much it costs.

AI Inference

Technical definition

Business use case

Example

Frequently asked questions

Keep exploring

Large Language Model

Machine Learning

Fine-Tuning

Put AI intelligence to work in your business