top of page

Tokenisation in AI/LLMs: How It Works and How Pricing Is Calculated

AI

Artificial Intelligence models like OpenAI GPT and Anthropic Claude are widely used in applications such as chatbots, automation tools, content generation, analytics, and more.

These models do not process text the way humans read it.

They work with tokens.


Understanding tokenisation is essential because it directly affects:

  • 💰 Cost

  • ⚡ Performance

  • 📏 Input and output limits


🔹 What is Tokenisation?


Tokenisation is the process of breaking text into smaller units called tokens.

A token can be:

  • A full word

  • Part of a word

  • A character

  • A symbol


Input:

Builddevops is amazing!

Possible tokens:

["Builddevops", " is", " amazing", "!"]

Key points:

  • Tokens are not always complete words

  • Spaces are often included

  • Words may be split into smaller parts


🔹 Why Tokenisation is Needed


AI models operate on numbers, not text.

The processing pipeline looks like this:


⚙️ How It Works Internally


AI

Everything inside the model works using numerical representations of tokens.


🔹 Types of Tokenisation


1. Word-Based

Input : "I love AI"
Output: ["I", "love", "AI"]

2. Subword Tokenisation (Most Common)

Input : "unbelievable"
Output: ["un", "believ", "able"]

This approach:

  • Handles unknown words

  • Reduces vocabulary size

  • Improves efficiency

3. Character-Based

Input : "AI"
Output: ["A", "I"]

🔹 Tokenisation Algorithms


Common techniques include:

  • Byte Pair Encoding (BPE) – used in GPT models

  • WordPiece – used by Google BERT

  • SentencePiece – language-independent


🔹 Tokens vs Words

Text Type

Approx Tokens

1 word

~1.3 tokens

1 sentence

10–20 tokens

1 paragraph

~100 tokens

👉 Rule of thumb:

1 token ≈ 4 characters in English

🔹 Context Window (Token Limit)


Each model has a maximum number of tokens it can process in one request.

This includes:

  • Input tokens

  • Output tokens


If the total exceeds the limit:

  • Input may be truncated

  • Or the request may fail


💰 How Pricing Works in LLMs


Pricing is based on the number of tokens processed.

Both are counted:

  • Input tokens

  • Output tokens


✅ Pricing Formula

Total Cost = (Input Tokens / 1000 × Input Price per 1K tokens)           + (Output Tokens / 1000 × Output Price per 1K tokens)

Important:

  • Pricing is always per 1000 tokens

  • Never calculate cost per single token directly


🔍 Example Calculation


Assume:

  • Input tokens = 100

  • Output tokens = 200


Pricing:

  • Input = $0.01 per 1000 tokens

  • Output = $0.02 per 1000 tokens


Step 1: Input Cost

(100 / 1000) × 0.01 = 0.001

Step 2: Output Cost

(200 / 1000) × 0.02 = 0.004

✅ Final Cost

Total = 0.001 + 0.004 = $0.005

⚠️ Common Mistake

Incorrect calculation:

100 × 0.01 = $1

Correct approach:

(100 / 1000) × 0.01

🔹 Why Tokens Matter


1. Cost

More tokens increase usage cost.

2. Performance

Higher token count can increase response time.

3. Limits

Exceeding token limits can cause failures or truncated responses.


🔹 Practical Examples


Short Input

"Summarize this article"

Low token usage → low cost

Large Input

"Analyze 10,000 lines of logs or text data"

High token usage → higher cost


🔹 Optimisation Tips


✔️ Remove unnecessary text ✔️ Avoid repeating information ✔️ Keep prompts concise ✔️ Limit output length ✔️ Preprocess large inputs before sending


🔹 Simple Token Counting Example

[root@siddhesh ~]# cat /usr/local/bin/Token_Counting.py
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4")
text = "Builddevops is amazing!"
tokens = enc.encode(text)
print("==== Full Sentence ====")
print("Text:", text)
print("Token count:", len(tokens))
print("Tokens:", tokens)
print("\n==== Word-wise Tokens ====")
words = text.split()
for word in words:
    word_tokens = enc.encode(word)
    decoded = [enc.decode([t]) for t in word_tokens]    print(f"\nWord: {word}")
    print(f"Token count: {len(word_tokens)}")
    print(f"Token IDs: {word_tokens}")
    print(f"Decoded Tokens: {decoded}")
[root@siddhesh ~]#

Output

[root@siddhesh ~]# python3 /usr/local/bin/Token_Counting.py
==== Full Sentence ====
Text: Builddevops is amazing!
Token count: 6
Tokens: [11313, 3667, 3806, 374, 8056, 0]
==== Word-wise Tokens ====
Word: Builddevops
Token count: 3
Token IDs: [11313, 3667, 3806]
Decoded Tokens: ['Build', 'dev', 'ops']
Word: is
Token count: 1
Token IDs: [285]
Decoded Tokens: ['is']
Word: amazing!
Token count: 3
Token IDs: [309, 6795, 0]
Decoded Tokens: ['am', 'azing', '!']
[root@siddhesh ~]#

🚀 Key Takeaways


✔️ AI models process tokens, not words

✔️ Tokenisation converts text into numerical form

✔️ Pricing depends on total tokens used

✔️ Always divide tokens by 1000 for cost calculation

✔️ Managing tokens improves efficiency and cost control


Conclusion

Tokenisation is a fundamental concept in modern AI systems. It affects how models understand input, generate output, and calculate usage costs.

A clear understanding of tokens helps in designing efficient, scalable, and cost-aware AI applications.

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page