Explain tokens, patterns, and lexemes. Demonstrate the same with examples.

Tokens, Patterns, and Lexemes

In Lexical Analysis, the terms token, pattern, and lexeme are fundamental building blocks used by the lexer (lexical analyzer) to convert source code into meaningful units.


1. Token

A token is a category or class of language elements.
It is a pair:

  • Token name – Abstract label (like id, number, if, etc.)
  • Attribute value (optional) – Holds extra info like the actual lexeme (e.g., variable name)

Think of a token like a label or tag for a specific part of code.


2. Pattern

A pattern is a rule or description that defines what a valid lexeme for a token looks like.
It can be:

  • A fixed string (e.g., "if" for the keyword if)
  • A regular expression (e.g., [a-zA-Z_][a-zA-Z0-9_]* for identifiers)

3. Lexeme

A lexeme is the actual sequence of characters in the source code that matches a pattern and gets recognized as a token.


Table Example – Tokens, Patterns, and Lexemes

Token NamePattern (Description)Sample Lexemes
ifThe characters i, fif
elseThe characters e, l, s, eelse
comparison<, >, <=, >=, ==, !=>=, ==
idA letter followed by letters/digitsscore, pi, D2
numberAny numeric constant3.14, 6.02e23, 0
literalAny string enclosed in quotes"Total =", "Hello"

Example Statement:

printf(“Total = “, score);

LexemeToken NamePattern Description
printfidIdentifier: starts with letter, followed by alphanumerics
"Total = "literalString enclosed in quotes
scoreidIdentifier
(, ), ,SymbolsEach treated as individual tokens
;SymbolToken for semicolon

Summary:

  • Token: Type/label of lexical unit (id, number, if, etc.)
  • Pattern: Rule that defines valid strings for the token
  • Lexeme: Actual string from source code that matches a pattern

Leave a Reply

Your email address will not be published. Required fields are marked *