Tokenisation in AI/LLMs: How It Works and How Pricing Is Calculated

Siddhesh Kadam
Mar 29
3 min read

Artificial Intelligence models like OpenAI GPT and Anthropic Claude are widely used in applications such as chatbots, automation tools, content generation, analytics, and more.

These models do not process text the way humans read it.

They work with tokens.

Understanding tokenisation is essential because it directly affects:

💰 Cost
⚡ Performance
📏 Input and output limits

🔹 What is Tokenisation?

Tokenisation is the process of breaking text into smaller units called tokens.

A token can be:

A full word
Part of a word
A character
A symbol

Input:

Builddevops is amazing!

Possible tokens:

["Builddevops", " is", " amazing", "!"]

Key points:

Tokens are not always complete words
Spaces are often included
Words may be split into smaller parts

🔹 Why Tokenisation is Needed

AI models operate on numbers, not text.

The processing pipeline looks like this:

⚙️ How It Works Internally

Everything inside the model works using numerical representations of tokens.

🔹 Types of Tokenisation

1. Word-Based

Input : "I love AI"
Output: ["I", "love", "AI"]

2. Subword Tokenisation (Most Common)

Input : "unbelievable"
Output: ["un", "believ", "able"]

This approach:

Handles unknown words
Reduces vocabulary size
Improves efficiency

3. Character-Based

Input : "AI"
Output: ["A", "I"]

🔹 Tokenisation Algorithms

Common techniques include:

Byte Pair Encoding (BPE) – used in GPT models
WordPiece – used by Google BERT
SentencePiece – language-independent

🔹 Tokens vs Words

Text Type	Approx Tokens
1 word	~1.3 tokens
1 sentence	10–20 tokens
1 paragraph	~100 tokens

👉 Rule of thumb:

1 token ≈ 4 characters in English

🔹 Context Window (Token Limit)

Each model has a maximum number of tokens it can process in one request.

This includes:

Input tokens
Output tokens

If the total exceeds the limit:

Input may be truncated
Or the request may fail

💰 How Pricing Works in LLMs

Pricing is based on the number of tokens processed.

Both are counted:

Input tokens
Output tokens

✅ Pricing Formula

Total Cost = (Input Tokens / 1000 × Input Price per 1K tokens)           + (Output Tokens / 1000 × Output Price per 1K tokens)

Important:

Pricing is always per 1000 tokens
Never calculate cost per single token directly

🔍 Example Calculation

Assume:

Input tokens = 100
Output tokens = 200

Pricing:

Input = $0.01 per 1000 tokens
Output = $0.02 per 1000 tokens

Step 1: Input Cost

(100 / 1000) × 0.01 = 0.001

Step 2: Output Cost

(200 / 1000) × 0.02 = 0.004

✅ Final Cost

Total = 0.001 + 0.004 = $0.005

⚠️ Common Mistake

Incorrect calculation:

100 × 0.01 = $1

Correct approach:

(100 / 1000) × 0.01

🔹 Why Tokens Matter

1. Cost

More tokens increase usage cost.

2. Performance

Higher token count can increase response time.

3. Limits

Exceeding token limits can cause failures or truncated responses.

🔹 Practical Examples

Short Input

"Summarize this article"

Low token usage → low cost

Large Input

"Analyze 10,000 lines of logs or text data"

High token usage → higher cost

🔹 Optimisation Tips

✔️ Remove unnecessary text ✔️ Avoid repeating information ✔️ Keep prompts concise ✔️ Limit output length ✔️ Preprocess large inputs before sending

🔹 Simple Token Counting Example

[root@siddhesh ~]# cat /usr/local/bin/Token_Counting.py
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4")
text = "Builddevops is amazing!"
tokens = enc.encode(text)
print("==== Full Sentence ====")
print("Text:", text)
print("Token count:", len(tokens))
print("Tokens:", tokens)
print("\n==== Word-wise Tokens ====")
words = text.split()
for word in words:
    word_tokens = enc.encode(word)
    decoded = [enc.decode([t]) for t in word_tokens]    print(f"\nWord: {word}")
    print(f"Token count: {len(word_tokens)}")
    print(f"Token IDs: {word_tokens}")
    print(f"Decoded Tokens: {decoded}")
[root@siddhesh ~]#

Output

[root@siddhesh ~]# python3 /usr/local/bin/Token_Counting.py
==== Full Sentence ====
Text: Builddevops is amazing!
Token count: 6
Tokens: [11313, 3667, 3806, 374, 8056, 0]
==== Word-wise Tokens ====
Word: Builddevops
Token count: 3
Token IDs: [11313, 3667, 3806]
Decoded Tokens: ['Build', 'dev', 'ops']
Word: is
Token count: 1
Token IDs: [285]
Decoded Tokens: ['is']
Word: amazing!
Token count: 3
Token IDs: [309, 6795, 0]
Decoded Tokens: ['am', 'azing', '!']
[root@siddhesh ~]#

🚀 Key Takeaways

✔️ AI models process tokens, not words

✔️ Tokenisation converts text into numerical form

✔️ Pricing depends on total tokens used

✔️ Always divide tokens by 1000 for cost calculation

✔️ Managing tokens improves efficiency and cost control

Conclusion

Tokenisation is a fundamental concept in modern AI systems. It affects how models understand input, generate output, and calculate usage costs.

A clear understanding of tokens helps in designing efficient, scalable, and cost-aware AI applications.

Tokenisation in AI/LLMs: How It Works and How Pricing Is Calculated

🔹 What is Tokenisation?

🔹 Why Tokenisation is Needed

⚙️ How It Works Internally

🔹 Types of Tokenisation

1. Word-Based

2. Subword Tokenisation (Most Common)

3. Character-Based

🔹 Tokenisation Algorithms

🔹 Tokens vs Words

🔹 Context Window (Token Limit)

💰 How Pricing Works in LLMs

✅ Pricing Formula

🔍 Example Calculation

Step 1: Input Cost

Step 2: Output Cost

✅ Final Cost

⚠️ Common Mistake

🔹 Why Tokens Matter

1. Cost

2. Performance

3. Limits

🔹 Practical Examples

Short Input

Large Input

🔹 Optimisation Tips

🔹 Simple Token Counting Example

🚀 Key Takeaways

Conclusion

Recent Posts

Comments

Hot · Warm · Cold · FrozenData Tiering in Elasticsearch

Mount Your S3 Bucket as a File System on Linux - Complete Hands-On Guide

Tokenisation in AI/LLMs: How It Works and How Pricing Is Calculated

Building Smart Load Balancing Using HAProxy and Custom Health Endpoints

Exploring HAProxy Monitoring Techniques

Building a Production-Ready Cluster: A Complete Three-Node Setup Guide

Filebeat — The Silent Log Shipper

From On-Prem MySQL to AWS RDS: A Practical Migration Guide

Join our mailing list