Inference is what happens when a trained AI model is actually put to use — receiving an input (a prompt, image, or query) and generating an output (text, code, a prediction, or classification). Inference is distinct from training, which is the computationally intensive process of building the model. In production applications, inference speed and cost are critical considerations. Cloud AI providers charge for inference usage, typically measured in tokens processed, making efficient prompting an important cost management practice.