Context Window
The maximum amount of text an AI model can read and consider at one time, measured in tokens, which determines how much conversation history, documents, or instructions it can use.
In plain English
A context window is like an AI model's working memory. It determines how much text the model can read at once — the larger the window, the more it can consider when generating a reply.
Technical definition
The context window is the maximum sequence length, measured in tokens, that a transformer model can process during a single forward pass. Both the input prompt and the generated output consume window space. Attention mechanisms in standard transformers have quadratic complexity with sequence length, which has historically limited context size, though architectural innovations continue to extend practical limits.
Business use case
Developers building document-analysis tools must choose models with context windows large enough to fit the documents they need to process. Call-centre teams using AI assistants need large contexts to hold the full conversation history and relevant policy documents simultaneously.
Example
If a contract is 50,000 tokens long but the model's context window is 32,000 tokens, the contract must be split into chunks. Using a model with a 200,000-token context window allows it to process the full contract in one pass.
Frequently asked questions
The context window is the maximum number of tokens a model can process in a single session, including both the input (your prompt and any documents) and the output (the model's response).
A larger context window lets the model read and reason over more content at once — longer documents, more conversation history, or more detailed instructions — without losing earlier information.
The model can no longer see the text that was cut off. Depending on implementation, earlier content is either dropped or you receive an error, which is why chunking and summarisation strategies matter for long-document tasks.
Context windows vary significantly across models. Some are measured in thousands of tokens while others reach hundreds of thousands. Larger windows are generally better for tasks requiring a model to read entire reports or codebases at once.
Keep exploring
Large Language Model
A large language model is an AI trained on huge amounts of text so it can read your question and write a useful answer. It powers chatbots and writing assistants.
Token
A token is a small piece of text, like a word or part of a word, that an AI reads one at a time. Models count tokens to measure how much text they can handle.
Retrieval-Augmented Generation
Retrieval-augmented generation lets an AI look up relevant facts before answering. Instead of relying only on memory, it pulls in the right documents and uses them to give a more accurate reply.
Put AI intelligence to work in your business
Sitebard AI brings together the data, guides, and career intelligence you need to make confident AI decisions.