Cursor is offering a wide range of models, including the latest state-of-the-art models.

Pricing

All model usage is counted and billed in requests. With Pro plan, you get 500 requests per month. Cursor offers two modes of usage:

Normal

Requests per model/message

Ideal for everyday coding tasks, recommended for most users.

Max

Requests per 1M tokens (MTok)

Best for complex reasoning, hard bugs, and agentic tasks.

Request

A request represents a single message sent to the model, which includes your message, any relevant context from your codebase, and the model’s response.

One request is $0.04

Slow requests

Slow requests automatically activate when you run out of normal requests. These requests are processed at a lower priority, meaning they are slower and you may experience longer delays compared to normal requests.

Slow requests are not available for Max mode.

Normal mode

In normal mode, each message costs a fixed number of requests based solely on the model you’re using, regardless of context. We optimize context management without it affecting your request count.

For example, let’s look at a conversation using Claude 3.5 Sonnet, where each message costs 1 request:

RoleMessageCost per message
UserCreate a plan for this change (using a more expensive model)1
CursorI’ll analyze the requirements and create a detailed implementation plan…0
UserImplement the changes with TypeScript and add error handling1
CursorHere’s the implementation with type safety and error handling…0
Total2 requests

Max Mode

In Max mode, usage is measured in tokens and converted to requests. This includes tokens from your messages, code files, folders, tool calls (like file reads and searches), and any other context provided to the model.

Since it’s based on tokens, it’s possible See example below.

We use the same tokenizers as the model providers (e.g. OpenAI’s tokenizer for GPT models, Anthropic’s for Claude models) to ensure accurate token counting. You can see an example using OpenAI’s tokenizer demo.

For example, if you send a prompt with 128k input tokens (priced at 20 requests per 1M tokens) and receive a 96k token response (priced at 15 requests per 1M tokens), the total cost would be calculated based on both input and output tokens as shown in the table below.

TokenModel cost (requests/1M tokens)Tokens usedCalculationTotal requests
Input20135k(20 / 1M) × 1350002.7 requests
Output1582k(15 / 1M) × 820001.23 requests
Total3.93 requests

Use table below to calculate the total requests for your prompt.

Models

Auto-select

Enabling Auto-select configures Cursor to select the premium model best fit for the immediate task and with the highest reliability based on current demand. This feature can detect degraded output performance and automatically switch models to resolve it.

Recommended for most users

Capabilities

Thinking

Enabling Thinking limits the list of models to reasoning models which think through problems step-by-step and have deeper capacity to examine their own reasoning and correct errors.

These models often perform better on complex reasoning tasks, though they may require more time to generate their responses.

Agentic

Agentic models can be used with Chat’s Agent mode. These models are highly capable at making tool calls and perform best with Agent.

Submitting an Agent prompt with up to 25 tool calls consumes one request. If your request extends beyond 25 tool calls, Cursor will ask if you’d like to continue which will consume a second request.

Max Mode

Some models support Max Mode, which is designed for the most complex and challenging tasks. Learn more about Max Mode.

Context windows

A context window is the maximum span of tokens (text and code) an LLM can consider at once, including both the input prompt and output generated by the model.

Each chat in Cursor maintains its own context window. The more prompts, attached files, and responses included in a session, the larger the context window grows.

Cursor actively optimizes the context window as the chat session progresses, intelligently pruning non-essential content while preserving critical code and conversation elements.

For best results, it’s recommended you take a purpose-based approach to chat management, starting a new session for each unique task.

Hosting

Models are hosted on US-based infrastructure by the model’s provider, a trusted partner or Cursor.

When Privacy Mode is enabled from Settings, neither Cursor nor the model providers will store your data, with all data deleted after each request is processed. For further details see our Privacy, Privacy Policy, and Security pages.

FAQ

What is a request?

A request is the message you send to the model.

What is a token?

A token is the smallest unit of text that can be processed by a model.