Models & Pricing
Available models in Cursor and their pricing
Cursor is offering a wide range of models, including the latest state-of-the-art models.
Pricing
All model usage is counted and billed in requests. With Pro plan, you get 500 requests per month. Cursor offers two modes of usage:
Normal
Requests per model/message
Ideal for everyday coding tasks, recommended for most users.
Max
Requests per 1M tokens (MTok)
Best for complex reasoning, hard bugs, and agentic tasks.
Request
A request represents a single message sent to the model, which includes your message, any relevant context from your codebase, and the model’s response.
One request is $0.04
Slow requests
Slow requests automatically activate when you run out of normal requests. These requests are processed at a lower priority, meaning they are slower and you may experience longer delays compared to normal requests.
Normal mode
In normal mode, each message costs a fixed number of requests based solely on the model you’re using, regardless of context. We optimize context management without it affecting your request count.
For example, let’s look at a conversation using Claude 3.5 Sonnet, where each message costs 1 request:
Role | Message | Cost per message |
---|---|---|
User | Create a plan for this change (using a more expensive model) | 1 |
Cursor | I’ll analyze the requirements and create a detailed implementation plan… | 0 |
User | Implement the changes with TypeScript and add error handling | 1 |
Cursor | Here’s the implementation with type safety and error handling… | 0 |
Total | 2 requests |
Max Mode
In Max mode, usage is measured in tokens and converted to requests. This includes tokens from your messages, code files, folders, tool calls (like file reads and searches), and any other context provided to the model.
Since it’s based on tokens, it’s possible See example below.
We use the same tokenizers as the model providers (e.g. OpenAI’s tokenizer for GPT models, Anthropic’s for Claude models) to ensure accurate token counting. You can see an example using OpenAI’s tokenizer demo.
For example, if you send a prompt with 128k input tokens (priced at 20 requests per 1M tokens) and receive a 96k token response (priced at 15 requests per 1M tokens), the total cost would be calculated based on both input and output tokens as shown in the table below.
Token | Model cost (requests/1M tokens) | Tokens used | Calculation | Total requests |
---|---|---|---|---|
Input | 20 | 135k | (20 / 1M) × 135000 | 2.7 requests |
Output | 15 | 82k | (15 / 1M) × 82000 | 1.23 requests |
Total | 3.93 requests |
Use table below to calculate the total requests for your prompt.
Models
Auto-select
Enabling Auto-select configures Cursor to select the premium model best fit for the immediate task and with the highest reliability based on current demand. This feature can detect degraded output performance and automatically switch models to resolve it.
Capabilities
Thinking
Enabling Thinking limits the list of models to reasoning models which think through problems step-by-step and have deeper capacity to examine their own reasoning and correct errors.
These models often perform better on complex reasoning tasks, though they may require more time to generate their responses.
Agentic
Agentic models can be used with Chat’s Agent mode. These models are highly capable at making tool calls and perform best with Agent.
Submitting an Agent prompt with up to 25 tool calls consumes one request. If your request extends beyond 25 tool calls, Cursor will ask if you’d like to continue which will consume a second request.
Max Mode
Some models support Max Mode, which is designed for the most complex and challenging tasks. Learn more about Max Mode.
Context windows
A context window is the maximum span of tokens (text and code) an LLM can consider at once, including both the input prompt and output generated by the model.
Each chat in Cursor maintains its own context window. The more prompts, attached files, and responses included in a session, the larger the context window grows.
Cursor actively optimizes the context window as the chat session progresses, intelligently pruning non-essential content while preserving critical code and conversation elements.
For best results, it’s recommended you take a purpose-based approach to chat management, starting a new session for each unique task.
Hosting
Models are hosted on US-based infrastructure by the model’s provider, a trusted partner or Cursor.
When Privacy Mode is enabled from Settings, neither Cursor nor the model providers will store your data, with all data deleted after each request is processed. For further details see our Privacy, Privacy Policy, and Security pages.
FAQ
What is a request?
A request is the message you send to the model.
What is a token?
A token is the smallest unit of text that can be processed by a model.